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(57) Abstract 

This invention relates to methods for detecting and se- 
quencing target nucleic acid sequences, and double-stranded 
nucleic acid sequences, to nucleic acid probes, to mass mod- 
ified nucleic acid probes, to arrays of probes useful in these 
methods and to kits and systems which contain these probes. 
Useful methods involve hybridizing the nucleic acids or nu- 
cleic acids which represent complementary or homologous 
sequences of the target to an array of nucleic acid probes. 
These probes comprise a single-stranded portion, an optional 
double-stranded portion and a variable sequence within the — 
single-stranded portion. The molecular weights of the hy- 
bridized nucleic acids of the set can be determined by mass 
spectroscopy, and the sequence of the target determined from 
the molecular weights of the fragments. Nucleic acids whose 
sequences can be determined include DNA or RNA in bio- 
logical samples such as patient biopsies and environmental 
samples. Probes may be fixed to a solid support such as a 
hybridization chip to facilitate automated molecular weight 
analysis and identification of the target sequence. 
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SOLID PHASE SEQUENCING OF BIOPOLYMERS 

Rights in the Invention 

This invention was made with United States Government 
support under grant number DE-FG-02-93ER61609, awarded by the United 
5 States Department of Energy, and the United States Government has certain 
rights in the invention. 

Background of the Invention 

1 . Field of the Invention 

This invention relates to methods for detecting and sequencing 
10 nucleic acids using sequencing by hybridization technology and molecular 
weight analysis. The invention also relates to probes and arrays useful in 
sequencing and detection and to kits and apparatus for determining sequence 
information. 

2. Description of the Background 

15 Since the recognition of nucleic acid as the carrier of the 

genetic code, a great deal of interest has centered around determining the 
sequence of that code in the many forms which it is found. Two landmark 
studies made the process of nucleic acid sequencing, at least with DNA. a 
common and relatively rapid procedure practiced in most laboratories. The 

20 first describes a process whereby terminally labeled DNA molecules are 
chemically cleaved at single base repetitions (A.M. Maxam and W. Gilbert. 
Proc. Natl. Acad. Sci. USA 74:560-64, 1977). Each base position in the 
nucleic acid sequence is then determined from the molecular weights of 
fragments produced by partial cleavages. Individual reactions were devised 

25 to cleave preferentially at guanine, at adenine, at cytosine and thymine at 
cytosine alone. When the products of these four reactions are resolved by 
molecular weight, using, for example, polyacrylamide ge! electrophoresis. 
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DNA sequences can be read from the pattern of fragments on the resolved 
gel. 

The second study describes a procedure whereby DNA is 
sequenced using a variation of the plus-minus method (F. Sanger et al., Proc. 
5 Natl. Acad. Sci. USA 74:5463-67, 1977). This procedure takes advantage 
of the chain terminating ability of dideoxynucleoside triphosphates 
(ddNTPs) and the ability of DNA polymerase to incorporate ddNTPs with 
nearly equal fidelity as the natural substrate of DNA polymera.se, 
deoxynucleosides triphosphates (dNTPs). Briefly, a primer, usually an 

1 0 oligonucleotide, and a template DNA are incubated together in the presence 
of a useful concentration of all four dNTPs plus a limited amount of a single 
ddNTP. The DNA polymerase occasionally incorporates a 
dideoxynucleotide which terminates chain extension. Because the 
dideoxynucleotide has no 3'-hydroxyl, the initiation point for the polymerase 

1 5 enzyme is lost. Polymerization produces a mixture of fragments of varied 
sizes, all having identical 3' termini. Fractionation of the mixture by, for 
example, polyacrylamide gel electrophoresis, produces a pattern which 
indicates the presence and position of each base in the nucleic acid. 
Reactions with each of the four ddNTPs allows one of ordinary skill to read 

20 an entire nucleic acid sequence from a resolved gel. 

Despite their advantages, these procedures are cumbersome 
and impractical when one wishes to obtain megabases of sequence 
information. Further, these procedures are, for all practical purposes, 
limited to sequencing DNA. Although variations have developed, it is still 

25 not possible using either process to obtain sequence information directly 
from any other form of nucleic acid. 
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A relatively new method for obtaining sequence information 
from a nucleic acid has recently been developed whereby the sequences of 
groups of contiguous bases are determined simultaneously. In comparison 
to traditional techniques whereby one determines base specific information 
5 of a sequence individually, this method, referred to as sequencing by 
hybridization (SBH), represents a many-fold amplification in speed. Due, 
at least in part to the increased speed, SBH presents numerous advantages 
including reduced expense and greater accuracy. Two general approaches 
of sequencing by hybridization have been suggested and their practicality 

10 has been demonstrated in pilot studies. In one format, a complete set of 4" 
nucleotides of length n is immobilized as an ordered array on a solid support 
and an unknown DNA sequence is hybridized to this array (K.R. Khrapko 
et al., J. DNA Sequencing and Mapping 1:375-88, 1991). The resulting 
hybridization pattern provides all "»-tuple" words in the sequence. This is 

1 5 sufficient to determine short sequences except for simple tandem repeats. 

In the second format, an array of immobilized samples is 
hybridized with one short oligonucleotide at a time (Z. Strezoska et al., Proc. 
Natl. Acad. Sci. USA 88:10,089-93. 1991). When repeated 4" times for each 
oligonucleotide of length n, much of the sequence of all the immobilized 

20 samples would be determined. In both approaches, the intrinsic power of 
the method is that many sequenced regions are determined in parallel. In 
actual practice the array size is about 1 0 4 to 1 0 5 . 

Another aspect of the method is that information obtained is 
quite redundant, and especially as the size of the nucleic acid probe grows. 

25 Mathematical simulations have shown that the method is quite resistant to 
experimental errors and that far fewer than all probes are necessary to 
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determine reliable sequence data (P.A. Pevzner et al., J. Biomol. Struc. & 
Dyn. 9:399-410, 1991; W. Bains, Genomics 11:295-301, 1991). 

In spite of an overall optimistic outlook, there are still a 
number of potentially severe drawbacks to actual implementation of 
5 sequencing by hybridization. First and foremost among these is that 4 n 
rapidly becomes quite a large number if chemical synthesis of all of the 
oligonucleotide probes is actually contemplated. Various schemes of 
automating this synthesis and compressing the products into a small scale 
array, a sequencing chip, have been proposed. 

10 There is also a poor level of discrimination between a 

correctly hybridized, perfectly matched duplexes, and end mismatches. In 
part, these drawbacks have been addressed at least to a small degree by the 
method of continuous stacking hybridization as reported by a Khrapko et al. 
(FEBS Lett. 256:1 18-22, 1989). Continuous stacking hybridization is based 

15 upon the observation that when a single-stranded oligonucleotide is 
hybridized adjacent to a double-stranded oligonucleotide, the two duplexes 
are mutually stabilized as if they are positioned side-to-side due to a 
stacking contact between them. The stability of the interaction decreases 
significantly as stacking is disrupted by nucleotide displacement, gap or 

20 terminal mismatch. Internal mismatches are presumably ignorable because 
their thermodynamic stability is so much less than perfect matches. 
Although promising, a related problem arises which is the inability to 
distinguish between weak, but correct duplex formation, and simple 
background such as non-specific adsorption of probes to the underlying 

25 support matrix. 
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Detection is also monochromatic wherein separate sequential 
positive and negative controls must be run to discriminate between a correct 
hybridization match, a mis-match, and background. All too often, 
ambiguities develop in reading sequences longer than a few hundred base 
5 pairs on account of sequence recurrences. For example, if a sequence one 
base shorter than the probe recurs three times in the target, the sequence 
position cannot be uniquely determined. The locations of these sequence 
ambiguities are called branch points. 

Secondary structures often develop in the target nucleic acid 
10 affecting accessibility of the sequences. This could lead to blocks of 
sequences that are unreadable if the secondary structure is more stable than 
occurs on the complementary strand. 

A final drawback is the possibility that certain probes will 
have anomalous behavior and for one reason or another, be recalcitrant to 
15 hybridization under whatever standard sets of conditions ultimately used. 
A simple example of this is the difficulty in finding matching conditions for 
probes rich in G/C content. A more complex example could be sequences 
with a high propensity to form triple helices. The only way to rigorously 
explore these possibilities is to carry out extensive hybridization studies with 
20 all possible oligonucleotides of length "«" under the particular format and 
conditions chosen. This is clearly impractical if many sets of conditions are 
involved. 

Among the early publication which appeared discussing 
sequencing by hybridization, E.M. Southern (WO 89/10977). described 
25 methods whereby unknown, or target, nucleic acids are labeled, hybridized 
to a set of nucleotides of chosen length on a solid support, and the nucleotide 
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sequence of the target determined, at least partially, from knowledge of the 
sequence of the bound fragments and the pattern of hybridization observed. 
Although promising, as a practical matter, this method has numerous 
drawbacks. Probes are entirely single-stranded and binding stability is 
5 dependent upon the size of the duplex. However, every additional 
nucleotide of the probe necessarily increases the size of the array by four 
fold creating a dichotomy which severely restricts its plausible use. Further, 
there is an inability to deal with branch point ambiguities or secondary 
structure of the target, and hybridization conditions will have to be tailored 

10 or in some way accounted for each binding event. Attempts have been made 
to overcome or circumvent these problems. 

R. Drmanac et al. (U.S. Patent No. 5,202,231) is directed to 
methods for sequencing by hybridization using sets of oligonucleotide 
probes with random or variable sequences. These probes, although useful, 

1 5 suffer from some of the same drawbacks as the methodology of Southern 
(1989), and like Southern, fail to recognize the advantages of stacking 
interactions. 

K.R. Khrapko et al. (FEBS Lett. 256:1 18-22, 1989; and J. 
DNA Sequencing and Mapping 1 :357-88, 1991) attempt to address some of 

20 these problems using a technique referred to as continuous stacking 
hybridization. With continuous stacking, conceptually, the entire sequence 
of a target nucleic acid can be determined. Basically, the target is 
hybridized to an array of probes, again single-stranded, denatured from the 
array, and the dissociation kinetics of denaturatton analyzed to determine the 

25 target sequence. Although also promising, discrimination between matches 
and mis-matches (and simple background) is low and, further, as 



WO 96/32504 



PCT/US96/05I36 



7 

hybridization conditions are inconstant for each duplex, discrimination 
becomes increasingly reduced with increasing target complexity. 

Another major problem with current sequencing formats is the 
inability to efficiently detect sequence information. In conventional 
5 procedures, individual sequences are separated by, for example, 
electrophoresis using capillary or slab gels. This step is slow, expensive and 
requires the talents of a number of highly trained individuals, and, more 
importantly, is prone to error. One attempt to overcome these difficulties 
has been to utilize the technology of mass spectrometry. 

1 0 Mass spectrometry of organic molecules was made possible 

by the development of instruments able to volatize large varieties of organic 
compounds and by the discovery that the molecular ion formed by 
volatization breaks down into charged fragments whose structures can be 
related to the intact molecule. Although the process itself is relatively 

15 straight forward, actual implementation is quite complex. Briefly, the 
sample molecule or analyte is volatized and the resulting vapor passed into 
an ion chamber where it is bombarded with electrons accelerated to a 
compatible energy level. Electron bombardment ionizes the molecules of 
the sample analyte and then directs the ions formed to a mass analyzer. The 

20 mass analyzer, with its combination of electrical and magnetic fields, 
separates impacting ions according to their mass/charge (m/e) ratios. From 
these ratios, the molecular weights of the impacting ions can be determined 
and the structure and molecular weight of the analyte determined. The 
entire process requires less than about 20 microseconds. 

25 Attempts to apply mass spectrometry to the analysis of 

biomolecules such as proteins and nucleic acids have been disappointing. 
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Mass spectrometric analysis has traditionally been limited to molecules with 
molecular weights of a few thousand daltons. At higher molecular weights, 
samples become increasingly difficult to volatize and large polar molecules 
generally cannot be vaporized without catastrophic consequences. The 
5 energy requirement is so significant that the molecule is destroyed or, even 
worse, fragmented. Mass spectra of fragmented molecules are often 
difficult or impossible to read. Fragment linking order, particularly useful 
for reconstructing a molecular structure, has been lost in the fragmentation 
process. Both signal to noise ratio and resolution are significantly 

10 negatively affected. In addition, and specifically with regard to 
biomotecular sequencing, extreme sensitivity is necessary to detect the 
single base differences between biomolecular polymers to determine 
sequence identity. 

A number of new methods have been developed based on the 

15 idea that heat, if applied with sufficient rapidity, will vaporize the sample 
biomolecule before decomposition has an opportunity to take place. This 
rapid heating technique is referred to as plasma desorption and there are 
many variations. For example, one method of plasma desorption involves 
placing a radioactive isotope such as Califomium-252 on the surface of a 

20 sample analyte which forms a blob of plasma. From this plasma, a few ions 
of the sample molecule will emerge intact. Field desorption ionization, 
another form of desorption. utilizes strong electrostatic fields to literally 
extract ions from a substrate. In secondary ionization mass spectrometry or 
fast ion bombardment, an analyte surface is bombarded with electrons which 

25 encourage the release of intact ions. Fast atom bombardment involves 
bombarding a surface with accelerated ions which are neutralized by a 
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charge exchange before they hit the surface. Presumably, neutralization of 
the charge lessens the probability of molecular destruction, but not the 
creation of ionic forms of the sample. In laser desorption, photons comprise 
the vehicle for depositing energy on the surface to volatize and ionize 
5 molecules of the sample. Each of these techniques has had some measure 
of success with different types of sample molecules. Recently, there have 
also been a variety of techniques and combinations of techniques 
specifically directed to the analysis of nucleic acids. 

Brennan et al. used nuclide markers to identify terminal 

10 nucleotides in a DNA sequence by mass spectrometry (U.S. Patent No. 
5,003,059). Stable nuclides, detectable by mass spectrometry, were placed 
in each of the four dideoxynucleotides used as reagents to polymerize cDNA 
copies of the target DNA sequence. Polymerized copies were separated 
electrophoretically by size and the terminal nucleotide identified by the 

1 5 presence of the unique label. 

Fenn et al. describes a process for the production of a mass 
spectrum containing a multiplicity of peaks (U.S. Patent No. 5.130,538). 
Peak components comprised multiply charged ions formed by dispersing a 
solution containing an analyte into a bath gas of highly charged droplets. 

20 An electrostatic field charged the surface of the solution and dispersed the 
liquid into a spray referred to as an electrospray (ES) of charged droplets. 
This nebulization provided a high charge/mass ratio for the droplets 
increasing the upper limit of volatization. Detection was still limited to less 
than about 100,000 daltons. 

25 Jacobson et al. utilizes mass spectrometry to analyze a DNA 

sequence by incorporating stable isotopes into the sequence (U.S. Patent No. 
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5,002,868). Incorporation required the steps of enzymatically introducing 
the isotope into a strand of DNA at a terminus, electrophoretically 
separating the strands to determine fragment size and analyzing the 
separated strand by mass spectrometry. Although accuracy was stated to 
5 have been increased, electrophoresis was necessary to isolate the labeled 
strand. 

Brennan also utilized stable markers to label the terminal 
nucleotides in a nucleic acid sequence, but added the step of completely 
degrading the components of the sample prior to analysis (U.S. Patent Nos. 

10 5,003,059 and 5,174,962). Nuclide markers, enzymatically incorporated 
into either dideoxynucleotides or nucleic acid primers, were 
electrophoretically separated. Bands were collected and subjected to 
combustion and passed through a mass spectrometer. Combustion converts 
the DNA into oxides of carbon, hydrogen, nitrogen and phosphorous, and 

15 the label into sulfur dioxide. Labeled combustion products were identified 
and the mass of the initial molecule reconstructed. Although fairly accurate, 
the process does not lend itself to large scale sequencing of biopolymers. 

A recent advancement in the mass spectrometric analysis of 
high molecular weight molecules in biology has been the development of 

20 time of flight mass spectrometry (TOF-MS) with matrix-assisted laser 
desorption ionization (MALDI). This process involves placing the sample 
into a matrix which contains molecules which assist in the desorption 
process by absorbing energy at the frequency used to desorp the sample. 
The theory is that volatization of the matrix molecules encourages 

25 volatization of the sample without significant destruction. Time of flight 
analysis utilizes the travel time or flight time of the various ionic species as 
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an accurate indicator of molecular mass. There have been some notable 
successes with these techniques. 

Beavis et al. proposed to measure the molecular weights of 
DNA fragments in mixtures prepared by either Maxam-Gilbert or Sanger 
5 sequencing techniques (U.S. Patent No. 5,288,644). Each of the different 
DNA fragments to be generated would have a common origin and terminate 
at a particular base along an unknown sequence. The separate mixtures 
would be analyzed by laser desorption time of flight mass spectroscopy to 
determine fragment molecular weights. Spectra obtained from each reaction 

10 would be compared using computer algorithms to determine the location of 
each of the four bases and ultimately, the sequence of the fragment. 

Williams et al. utilized a combination of pulsed laser ablation, 
multiphoton ionization and time of flight mass spectrometry. Effective laser 
desorption was accomplished by ablating a frozen film of a solution 

15 containing sample molecules. When ablated, the film produces an 
expanding vapor plume which entrains the intact molecules for analysis by 
mass spectrometry. 

Even more recent developments in mass spectrometry have 
further increased the upper limits of molecular weight detection and 

20 determination. Mass spectrograph systems with reflectors in the flight tube 
have effectively doubled resolution. Reflectors also compensate for errors 
in mass caused by the fact that the ionized/accelerated region of the 
instrument is not a point source, but an area of finite size wherein ions can 
accelerate at any point. Spatial differences between particle the origination 

25 points of the particles, problematic in conventional instruments because 
arrival times at the detector will van', are overcome. Particles that spend 
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more time in the accelerating field will also spend more time in the retarding 
field. Therefore, particles emerging from the reflector are mostly 
synchronous, vastly improving resolution. 

Despite these advances, it is still not possible to generate 
5 coordinated spectra representing a continuous sequence. Furthermore, 
throughput is sufficiently slow so as to make these methods impractical for 
large scale analysis of sequence information. 

Summary of the Invention 

10 The present invention overcomes the problems and 

disadvantages associated with current strategies and designs and provides 
methods, kits and apparatus for determining the sequence of target nucleic 
acids. 

One embodiment of the invention is directed to methods for 
1 5 sequencing a target nucleic acid. A set of nucleic acid fragments containing 
a sequence which is complementary or homologous to a sequence of the 
target is hybridized to an array of nucleic acid probes wherein each probe 
comprises a double-stranded portion, a single-stranded portion and a 
variable sequence within said single-stranded portion, forming a target array 
20 of nucleic acids. Molecular weights for a plurality of nucleic acids of the 
target array are determined and the sequence of the target constructed. 
Nucleic acids of the target, the target sequence, the set and the probes may 
be DNA. RNA or PNA comprising purine, pyrimidine or modified bases. 
The probes may be fixed to a solid support such as a hybridization chip to 
25 facilitate automated determination of molecular weights and identification 
of the target sequence. 
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Another embodiment of the invention is directed to methods 
for sequencing a target nucleic acid. A set of nucleic acid fragments 
containing a sequence which is complementary or homologous to a 
sequence of the target is hybridized to an array of nucleic acid probes 
5 forming a target array containing a plurality of nucleic acid complexes. One 
strand of those probes hybridized by a fragment is extended using the 
fragment as a template. Molecular weights for a plurality of nucleic acids 
of the target array are determined and the sequence of the target constructed. 
Strands can be enzymatically extended using chain terminating and chain 
1 0 elongating nucleotides. The resulting nested set of nucleic acids represents 
the sequence of the target. 

Another embodiment of the invention is directed to methods 
for detecting a target nucleic acid. A set of nucleic acids complementary to 
a sequence of the target, is hybridized to a fixed array of nucleic acid probes. 

15 The molecular weights of the hybridized nucleic acids are determined by 
mass spectrometry and a sequence of the target can be identified. Target 
nucleic acids may be obtained from biological samples such as patient 
samples wherein detection of the target is indicative of a disorder in the 
patient, such as a genetic defect, a neoplasm or an infection. 

20 Another embodiment of the invention is directed to methods 

for sequencing a target nucleic acid. A sequence of the target is cleaved into 
nucleic acid fragments and the fragments hybridized to an array of nucleic 
acid probes. Fragments are created by enzymatically or physically cleaving 
the target and the sequence of the fragments is homologous with or 

25 complementary to at least a portion of the target sequence. The array is 
attached to a solid support and the molecular weights of the hybridized 
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fragments determined by mass spectrometry. From the molecular weights 
determined, nucleotide sequences of the hybridized fragments are 
determined and a nucleotide sequence of the target can be identified. 

Another embodiment of the invention is directed to methods 
5 for sequencing a target nucleic acid. A set of nucleic acids complementary 
to a sequence of the target is hybridized to an array of single-stranded 
nucleic acid probes wherein each probe comprises a constant sequence and 
a variable sequence and said variable sequence is determinable. The 
molecular weights of the hybridized nucleic acids are determined and the 

10 sequence of said target identified- The array comprises less than or equal to 
about 4 R different probes and R is the length in nucleotides of the variable 
sequence and may be attached to a solid support. 

Another embodiment of the invention is directed to methods 
for sequencing a target nucleic acid by strand-displacement, double-stranded 

15 sequencing. A set of partially single-stranded and partially double-stranded 
nucleic acid fragments are provided wherein each fragment contains a 
sequence that corresponds to a sequence of the target. These nucleic acid 
fragments are hybridized to a set of partially single-stranded and partially 
double-stranded nucleic acid probes, via the single-stranded regions of each. 

20 to form a set of fragment/probe complexes. Prior to hybridization, either the 
fragments or the probes may be treated with a phosphorylase to remove 
phosphate groups from the 5 r -termini of the nucleic acids. 5'-termini are 
ligated with adjacent 3'-termini of the complex forming a common single 
strand. The complementary unligated strand contains a nick which is 

25 recognized by a nucleic acid polymerase thai initiates strand-displacement 
polymerization, extending the unligated strand. Polymerization proceeds. 
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using the ligated strand as a template, in the presence of labeled nucleotides 
such as mass modified nucleotides. The sequence of the target can be 
determined by mass spectrometry from the molecular weights of the 
extended strands. This process can be used to sequence target nucleic acids 
5 and also to identify a single sequence in a mixed background. Selection of 
the species of nucleic acid to be sequenced occurs upon hybridization to the 
probe. As only fragments complementary to the single-stranded region of 
the probe will form complexes, only those fragments complexes are 
sequenced. 

10 Another embodiment of the invention is directed to arrays of 

nucleic acid probes. In these arrays, each probe comprises a first strand and 
a second strand wherein the first strand is hybridized to the second strand 
forming a double-stranded portion, a single-stranded portion and a variable 
sequence within the single-stranded portion. The array may be attached to 

1 5 a solid support such as a material that facilitates volatization of nucleic acids 
for mass spectrometry. Arrays can be fixed to hybridization chips 
containing less than or equal to about 4 R different probes wherein R is the 
length in nucleotides of the variable sequence. Arrays can be used in 
detection methods and in kits to detect nucleic acid sequences which may 

20 be indicative of a disorder and in sequencing systems such as sequencing by 
mass spectrometry. 

Another embodiment of the invention is directed to arrays of 
single-stranded nucleic acid probes wherein each probe of the array 
comprises a constant sequence and a variable sequence which is 

25 determinable. Arrays may be attached to solid supports which comprise 
matrices that facilitate volatization of nucleic acids for mass spectrometry. 



WO 96/32504 



PCT/US96/05136 



16 

Arrays, generated by conventional processes, may be characterized using the 
above methods and replicated in mass for use in nucleic acid detection and 
sequencing systems. 

Another embodiment of the invention is directed to kits for 
5 detecting a sequence of a target nucleic acid. Kits contain arrays of nucleic 
acid probes fixed to a solid support wherein each probe comprises a double- 
stranded portion, a single-stranded portion and a variable sequence within 
said single-stranded portion. The solid support may be, for example, coated 
with a matrix that facilitates volatization of nucleic acids for mass 
1 0 spectrometry such as an aqueous composition. 

Another embodiment of the invention is directed to mass 
spectrometry systems for the rapid sequencing of nucleic acids. Systems 
comprise a mass spectrometer, a computer with appropriate software and 
probe arrays which can be used to capture and sort nucleic acid sequences 
1 5 for subsequent analysis by mass spectrometry. 

Other embodiments and advantages of the invention are set 
forth, in part, in the description which follows and, in part, will be obvious 
from this description and may be learned from the practice of the invention. 



20 Description of the Drawings 

Figure 1 (A) Schematic of a mass modified nucleic acid primer; and 

(B) primer mass modification moieties. 
Figure 2 (A) Schematic of mass modified nucleoside triphosphate 
etongators and terminators; and (B ) nucleoside triphosphate 
25 mass modification moieties. 

Figure 3 List of mass modification moieties. 
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Figure 4 List of mass modification moieties. 

Figure 5 Cleavage site of Mwo 1 indicating bidirectional sequencing. 
Figure 6 Schematic of sequencing strategy after target DNA digestion 
by TspKX. 

5 Figure 7 Calculated T m of matched and mismatched complementary 
DNA. 

Figure 8 Replication of a master array. 

Figure 9 Reaction scheme for the covalent attachment of DNA to a 
surface. 

10 Figure 10 Target nucleic acid capture and ligation. 

Figure 1 1 Ligation efficiency of matches as compared to mismatches. 
Figure 12 (A) Ligation of target DNA with probe attached at 5'- 

termtnus; and (B) ligation of target DNA with probe attached 

at the 3'-terminus. 

15 Figure 13 Gel reader sequencing results from primer hybridization 
analysis. 

Figure 14 Mass spectrometry of oligonucleotide ladder. 
Figure 15 Schematic of mass modification by alkylation. 
Figure 16 Mass spectrum of 1 7-mer target with 0. 1 or 2 mass modified 
20 moieties. 

Figure 17 Schematic of nicked strand displacement sequencing with 

immobilized template. 
Figure 18 Analy sis of sequencing reaction in the presence and absence 

of single-stranded DNA binding protein. 
25 Figure 19 Schematic of nicked strand displacement sequencing with 

immobilized probe. 
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Figure 20 Results of sequencing performed using DF27- 1 as a probe. 
Figure 2 1 Results of sequencing performed using DF27-2 as a probe. 
Figure 22 Results of sequencing performed using DF27-4 as a probe. 
Figure 23 Results of sequencing performed using DF27-5-CY5 as a 
probe. 

Figure 24 Results of sequencing performed using DF27-6-CY5 as a 
probe. 



Description of t he Invention 

10 As embodied and broadly described herein, the present 

invention is directed to methods for sequencing a nucleic acid, probe arrays 
useful for sequencing by mass spectrometry and kits and systems which 
comprise these arrays. 

Nucleic acid sequencing, on both a large and small scale, is 

1 5 critical to many aspects of medicine and biology such as. for example, in the 
identification, analysis or diagnosis of diseases and disorders, and in 
determining relationships between living organisms. Conventional 
sequencing techniques rely on a base-by-base identification of the sequence 
using electrophoresis in a semi-solid such as an agarose or polyacrylamide 

20 gel to determine sequence identity. Although attempts have been made to 
apply mass spectrometric analysis to these methods, the two processes are 
not well suited because, at least in part, information is still be gathered in a 
single base format. Sequencing-by-hybridization methodology has 
enhanced the sequencing process and provided a more optimistic outlook for 

25 more rapid sequencing techniques, however, this methodology is no more 
applicable to mass spectrometry than traditional sequencing techniques. 
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In contrast, positional sequencing by hybridization (PSBH) 
with its ability to stably bind and discriminate different sequences with large 
or small arrays of probes is well suited to mass spectrometric analysis. 
Sequence information is rapidly determined in batches and with a minimum 
5 of effort. Such processes can be used for both sequencing unknown nucleic 
acids and for detecting known sequences whose presence may be an 
indicators of a disease or contamination. Additionally, these processes can 
be utilized to create coordinated patterns of probe arrays with known 
sequences. , Determination of the sequence of fragments hybridized to the 

1 0 probes also reveals the sequence of the probe. These processes are currently 
not possible with conventional techniques and, further, a coordinated batch- 
type analysis provides a significant increase in sequencing speed and 
accuracy which is expected to be required for effective large scale 
sequencing operations. 

15 PSBH is also well suited to nucleic acid analysis wherein 

sequence information is not obtained directly from hybridization. Sequence 
information can be learned by coupling PSBH with techniques such as mass 
spectrometry. Target nucleic acid sequences can be hybridized to probes or 
array of probes as a method of sorting nucleic acids having distinct 

20 sequences without having a priori knowledge of the sequences of the 
various hybridization events. As each probe will be represented as multiple 
copies, it is only necessary that hybridization has occurred to isolate distinct 
sequence packages. In addition, as distinct packages of sequences, they can 
be amplified, modified or otherwise controlled for subsequent analysis. 

25 Amplification increases the number of specific sequences which assists in 
any analysis requiring increased quantities of nucleic acid while retaining 
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sequence specificity. Modification may involve chemically altering the 
nucleic acid molecule to assist with later or downstream analysis. 

Consequently, another important feature of the invention is the 
ability to simply and rapidly mass modify the sequences of interest. A mass 
5 modification is an alteration in the mass, typically measured in terms of 
molecular weight as daltons, of a molecule. Mass modification which 
increase the discrimination between at least two nucleic acids with single 
base differences in size or sequence can be used to facilitate sequencing 
using, for example, molecular weight determinations. 

10 One embodiment of the invention is directed to a method for 

sequencing a target nucleic acid using mass modified nucleic acids and mass 
spectrometry technology. Target nucleic acids which can be sequenced 
include sequences of deoxyribonucleic acid (DNA) or ribonucleic acid 
(RNA). Such sequences may be obtained from biological, recombinant or 

1 5 other man-made sources, or purified from a natural source such as a patient's 
tissue or obtained from environmental sources. Alternate types of molecules 
which can be sequenced includes polyamide nucleic acid (PNA) (P.E. 
Nielsen et al.. Sci. 254:1497-1500, 1991) or any sequence of bases joined 
by a chemical backbone that have the ability to base pair or hybridize with 

20 a complementary chemical structure. 

The bases of DNA, RNA and PNA include purines, 
pyrimidines and purine and pyrimidine derivatives and modifications, which 
are linearly linked to a chemical backbone. Common chemical backbone 
structures are deoxyribose phosphate, ribose phosphate, and polyamide. The 

25 purines of both DNA and RNA are adenine (A) and guanine (G). Others 
that are known to exist include xanthine, hypoxanthine. 2- and 1 - 
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diaminopurine, and other more modified bases. The pyrimidines are 
cytosine (C), which is common to both DNA and RNA, uracil (U) found 
predominantly in RNA, and thymidine (T) which occurs almost exclusively 
in DNA. Some of the more atypical pyrimidines include methylcytosine, 
5 hydroxymethyl-cytosine, methyluracil, hydroxymethyluracil, 
dihydroxypentyluracil, and other base modifications. These bases interact 
in a complementary fashion to form base-pairs, such as, for example, 
guanine with cytosine and adenine with thymidine. This invention a so 
encompasses situations in which there is non-traditional base pairing such 

10 as Hoogsteen base pairing which has been identified in certain tRNA 
molecules and postulated to exist in a triple helix. 

Sequencing involves providing a nucleic acid sequence which 
is homologous or complementary to a sequence of the target. Sequences 
may be chemically synthesized using, for example, phosphoramidite 

1 5 chemistry or created enzymatically by incubating the target in an appropriate 
buffer with chain elongating nucleotides and a nucleic acid polymerase. 
Initiation and termination sites can be controlled with dideoxynucleotides 
or oligonucleotide primers, or by placing coded signals directly into the 
nucleic acids. The sequence created may comprise any portion of the target 

20 sequence or the entire sequence. Alternatively, sequencing may involve 
elongating DNA in the presence of boron derivatives of nucleotide 
triphosphates. Resulting double-stranded samples are treated with a 3' 
exonuclease such as exonuclease III. This exonuclease stops when it 
encounters a boronated residue thereby creating a sequencing ladder. 

25 Nucleic acids can also be purified, if necessary to remove 

substances which could be harmful (e.g. toxins), dangerous (e.g. infectious) 
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or might interfere with the hybridization reaction or the sensitivity of that 
reaction (e.g. metals, salts, protein, lipids). Purification may involve 
techniques such as chemical extraction with salts, chloroform or phenol, 
sedimentation centrifugation, chromatography or other techniques known 
5 to those of ordinary skill in the art. 

If sufficient quantities of target nucleic acid are available and 
the nucleic acids are sufficiently pure or can be purified so that any 
substances which would interfere with hybridization are removed, a plurality 
of target nucleic acids may be directly hybridized to the array. Sequence 

1 0 information can be obtained without creating complementary or homologous 
copies of a target sequence. 

Sequences may also be amplified, if necessary or desired, to 
increase the number of copies of the target sequence using, for example, 
polymerase chain reactions (PCR) technology or any of the amplification 

15 procedures. Amplification involves denaturation of template DNA by 
heating in the presence of a large molar excess of each of two or more 
oligonucleotide primers and four dNTPs (dGTP. dCTP. dATP. dTTP). The 
reaction mixture is cooled to a temperature that allows the oligonucleotide 
primer to anneal to target sequences, after which the annealed primers are 

20 extended with DNA polymerase. The cycle of denaturation, annealing, and 
DNA synthesis, the principal of PCR amplification, is repeated many times 
to generate large quantities of product which can be easily identified. 

The major product of this exponential reaction is a segment of 
double stranded DNA whose termini are defined by the 5' termini of the 

25 oligonucleotide primers and whose length is defined by the distance between 
the primers. Under normal reaction conditions, the amount of polymerase 
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becomes limiting after 25 to 30 cycles or about one million fold 
amplification. Further, amplification is achieved by diluting the sample 
1000 fold and using it as the template for further rounds of amplification in 
another PCR. By this method, amplification levels of 10 9 to 10 10 can be 
5 achieved during the course of 60 sequential cycles. This allows for the 
detection of a single copy of the target sequence in the presence of 
contaminating DNA, for example, by hybridization with a radioactive probe. 
With the use of sequential PCR, the practical detection limit of PCR can be 
as low as 10 copies of DNA per sample. 

10 Although PCR is a reliable method for amplification of target 

sequences, a number of other techniques can be used such as ligase chain 
reaction, self sustained sequence replication, QP replicase amplification, 
polymerase chain reaction linked ligase chain reaction, gapped ligase chain 
reaction, ligase chain detection and strand displacement amplification. The 

15 principle of ligase chain reaction is based in part on the ligation of two 
adjacent synthetic oligonucleotide primers which uniquely hybridize to one 
strand of the target DNA or RNA. If the target is present, the two 
oligonucleotides can be covalently linked by ligase. A second pair of 
primers, almost entirely complementary to the first pair of primers is also 

20 provided. The template and the four primers are placed into a thermocy cler 
with a thermostable ligase. As the temperature is raised and lowered, 
oligonucleotides are renatured immediately adjacent to each other on the 
template and ligated. The ligated product of one reaction serves as the 
template for a subsequent round of ligation. The presence of target is 

25 manifested as a DNA fragment with a length equal to the sum of the two 
adjacent oligonucleotides. 
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Target sequences are fragmented, if necessary, into a plurality 
of fragments using physical, chemical or enzymatic means to create a set of 
fragments of uniform or relatively uniform length. Preferably, the 
sequences are enzymatically cleaved using nucleases such as DNases or 
5 RNases (mung bean nuclease, micrococcal nuclease, DNase I, RNase A, 
RNase Tl), type I or II restriction endonucleases, or other site-specific or 
non-specific endonucleases. Sizes of nucleic acid fragments are between 
about 5 to about 1,000 nucleotides in length, preferably between about 10 
to about 200 nucleotides in length, and more preferably between about 1 2 

10 to about 100 nucleotides in length. Sizes in the range of about 5, 10, 12, 15, 
18, 20, 24, 26, 30 and 35 are useful to perform small scale analysis of short 
regions of a nucleic acid target. Fragment sizes in the range of 25, 50, 75, 
125, 150, 175, 200 and 250 nucleotides and larger are useful for rapidly 
analyzing larger target sequences. 

15 Target sequences may also be enzymatically synthesized 

using, for example, a nucleic acid polymerase and a collection of chain 
elongating nucleotides (NTPs, dNTPs) and limiting amounts of chain 
terminating (ddNTPs) nucleotides. This type of polymerization reaction can 
be controlled by varying the concentration of chain terminating nucleotides 

20 to create sets, for example nested sets, which span various size ranges. In 
a nested set, fragments will have common one terminus and one terminus 
which will be different between the members of the set such that the larger 
fragments will contain the sequences of the smaller fragments. 

The set of fragments created, which may be either homologous 

25 or complementary to the target sequence, is hybridized to an array of nucleic 
acid probes forming a target array of nucleic acid probe/fragment 
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complexes. An array constitutes an ordered or structured plurality of nucleic 
acids which may be fixed to a solid support or in liquid suspension. 
Hybridization of the fragments to the array allows for sorting of very large 
collections of nucleic acid fragments into identifiable groups. Sorting does 
5 not require a priori knowledge of the sequences of the probes, and can 
greatly facilitate analysis by, for example, mass spectrophotometric 
techniques. 

Hybridization between complementary bases of DNA, RNA, 
PNA, or combinations of DNA, RNA and PNA, occurs under a wide variety 

10 of conditions such as variations in temperature, salt concentration, 
electrostatic strength, and buffer composition. Examples of these conditions 
and methods for applying them are described in Nucleic Acid Hybridization: 
A Practical Approach (B.D. Hames and S.J. Higgins, editors, IRL Press, 
1985). It is preferred that hybridization takes place between about 0°C and 

15 about 70 °C, for periods of from about one minute to about one hour, 
depending on the nature of the sequence to be hybridized and its length. 
However, it is recognized that hybridizations can occur in seconds or hours, 
depending on the conditions of the reaction. For example, typical 
hybridization conditions for a mixture of two 20-mers is to bring the mixture 

20 to 68 °C and let cool to room temperature (22 °C) for five minutes or at very- 
low temperatures such as 2°C in 2 microliters. Hybridization between 
nucleic acids may be facilitated using buffers such as Tris-EDTA (TE), Tris- 
HC1 and HEPES. salt solutions {e.g. NaCl. KC1, CaCU). other aqueous 
solutions, reagents and chemicals. Examples of these reagents include 

25 single-stranded binding proteins such as Rec A protein. T4 gene 32 protein. 
E. coli single-stranded binding protein and major or minor nucleic acid 
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groove binding proteins. Examples of other reagents and chemicals include 
divalent ions, polyvalent ions and intercalating substances such as ethidium 
bromide, actinomycin D, psoralen and angelicin. 

Optionally, hybridized target sequences may be ligated to a 
5 single-strand of the probes thereby creating ligated target-probe complexes 
or ligated target arrays. Ligation of target nucleic acid to probe increases 
fidelity of hybridization and allows for incorrectly hybridized target to be 
easily washed from correctly hybridized target. More importantly, the 
addition of a ligation step allows for hybridizations to be performed under 

10 a single set of hybridization conditions. Variation of hybridization 
conditions due to base composition are no longer relevant as nucleic acids 
with high ATT or G/C content ligate with equal efficiency. Consequently, 
discrimination is very high between matches and mis-matches, much higher 
than has been achieved using other methodologies wherein the effects of 

15 G/C content were only somewhat neutralized in high concentrations of 
quaternary or tertiary amines such as, for example, 3M tetramethy! 
ammonium chloride. Further, hybridization conditions such as temperatures 
of between about 22°C to about 37°C, salt concentrations of between about 
0.05 M to about 0.5 M. and hybridization times of between about less than 

20 one hour to about 14 hours (overnight), are also suitable for ligation. 
Ligation reactions can be accomplished using a eukaryotic derived or a 
prokaryotic derived ligase such as T4 DN A or RNA ligase. Methods for use 
of these and other nucleic acid modifying enzymes are described in Current 
Protocols in Molecular Biology (F.M. Ausubel et al.. editors. John Wiley & 

25 Sons. 1989). 
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Each probe of the probe array comprises a single-stranded 
portion, an optional double-stranded portion and a variable sequence within 
the single-stranded portion. These probes may be DNA, RNA, PNA, or any 
combination thereof, and may be derived from natural sources or 
5 recombinant sources, or be organically synthesized. Preferably, each probe 
has one or more double stranded portions which are about 4 to about 30 
nucleotides in length, preferably about 5 to about 15 nucleotides and more 
preferably about 7 to about 12 nucleotides, and may also be identical within 
the various probes of the array, one or more single stranded portions which 

10 are about 4 to 20 nucleotides in length, preferably between about 5 to about 
12 nucleotides and more preferably between about 6 to about 10 nucleotides, 
and a variable sequence within the single stranded portion which is about 4 
to 20 nucleotides in length and preferably about 4, 5, 6, 7 or 8 nucleotides 
in length. Overall probe sizes may range from as small as 8 nucleotides in 

15 lengths to 100 nucleotides and above. Preferably, sizes are from about 12 
to about 35 nucleotides, and more preferably, from about 12 to about 25 
nucleotides in length. 

Probe sequences may be partly or entirely known, 
determinable or completely unknown. Known sequences can be created, for 

20 example, by chemically synthesizing individual probes with a specified 
sequence at each region. Probes with determinable variable regions may be 
chemically synthesized with random sequences and the sequence 
information determined separately. Either or both the single-stranded and^ 
the double-stranded regions may comprise constant sequences such as. for 

25 example, when an area of the probe or hybridized nucleic acid would benefit 
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from having a constant sequence as a point of reference in subsequent 
analyses. 

An advantage of this type of probe is in its structure. 
Hybridization of the target nucleic acid is encouraged due to the favorable 
5 thermodynamic conditions, including base-stacking interactions, established 
by the presence of the adjacent double strandedness of the probe. Probes 
may be structured with terminal single-stranded regions which consist 
entirely or partly of variable sequences, internal single-stranded regions 
which contain both constant and variable regions, or combinations of these 

10 structures. Preferably, the probe has a single-stranded region at one 
terminus and a double-stranded region at the opposite terminus. 

Fragmented target sequences, preferably, will have a 
distribution of terminal sequences sufficiently broad so that the nucleotide 
sequence of the hybridized fragments will include the entire sequence of the 

15 target nucleic acid. Consequently, the typical probe array will comprise a 
collection of probes with sufficient sequence diversity in the variable 
regions to hybridize, with complete or nearly complete discrimination, all 
of the target sequence or the target-derived sequences. The resulting target 
array will comprise the entire target sequence on strands of hybridized 

20 probes. By way of example only, if the variable portion consisted of a four 
nucleotide sequence (R=4) of adenine, guanine, thymine, and cytosine. the 
total number of possible combinations (4 R ) would be 4 J or 256 different 
nucleic acid probes. If the number of nucleotides in the variable sequence 
was five, the number of different probes within the set would be 4 5 or 1 .024. 

25 In addition, it is also possible to utilize probes wherein the variable 
nucleotide sequence contains gapped segments, or positions along the 
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variable sequence which will base pair with any nucleotide or at least not 
interfere with adjacent base pairing. 

A nucleic acid strand of the target array may be extended or 
elongated enzymatically. Either the hybridized fragment or one or the other 
5 of the probe strands can be extended. Extension reactions can utilize various 
regions of the target array as a template. For example, when fragment 
sequences are longer than the hybridizable portion of a probe having a 3' 
single-stranded terminus, the probe will have a 3' overhang and a 5' 
overhang after hybridization of the fragment. The now internal 3' terminus 

1 0 of the one strand of the probe can be used as a primer to prime an extension 
reaction using, for example, an appropriate nucleic acid polymerase and 
chain elongating nucleotides. The extended strand of the probe will contain 
sequence information of the entire hybridized fragment. Reaction mixtures 
containing dideoxynucleotides will create a set of extended strands of 

15 varying lengths and, preferably, a nested set of strands. As the fragments 
have been initially sorted by hybridization to the array , each probe of the 
array will contain sets of nucleic acids that represent each segment of the 
target sequence. Base sequence information can be determined from each 
extended probe. Compilation of the sequence information from the array, 

20 which may require computer assistance with very large arrays, will allow 
one to determine the sequence of the target. Depending on the structure of 
the probe (e.g. 5' overhang, 3' overhang, internal single-stranded region), 
strands of the probe or strands of hybridized nucleic acid containing target 
sequence can also be enzymatically amplified by. for example, single primer 

25 PCR reactions. Variations of this process may involve aspects of strand 
displacement amplification, QP replicase amplification, self-sustained 
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sequence replication amplification and any of the various polymerase chain 
reaction amplification technologies. 

Extended nucleic acid strands of the probe can be mass 
modified using a variety of techniques and methodologies. The most 
5 straight forward may be to enzymatically synthesize the extension utilizing 
a polymerase and nucleotide reagents, such as mass modified chain 
elongating and chain terminating nucleotides. Mass modified nucleotides 
incorporate into the growing nucleic acid chain. Mass modifications may- 
be introduced in most sites of the macromolecule which do not interfere 

10 with the hydrogen bonds required for base pair formation during nucleic 
acid hybridization. Typical modifications include modification of the 
heterocyclic bases, modifications of the sugar moiety (ribose or 
deoxyribose), and modifications of the phosphate group. Specifically, a 
modifying functionality, which may be a chemical moiety, is placed at or 

1 5 covalently coupled to the C2, N3, N7 or N8 positions of purines, or the N7 
or N9 positions of deazapurines. Modifications may also be placed at the 
C5 or C6 positions of pyrimidines (e.g. Figures 1A, IB, 2 A and 2B), 
Examples of useful modifying groups include deuterium, F, CI. Br, I, biotin, 
fluorescein, iododicarbocyanine dye, SiR, Si(CH 3 ) 3 , Si(CH 3 ) : (C 2 H 5 ), 

20 Si(CH 3 ) 2 (C 2 H 5 ) 2 , Si(CH ){C H ) , 5 ,Si(C H ) 3 ) CH , , £CH ) 3 NR. 

CH.CONR. (CH 2 ) n OH, CH 2 F, CHF 2 and CF 3 ; wherein n is an integer and R 
is selected from the group consisting of -H, deuterium and alkyls. alkoxys 
and arvls of 1-6 carbon atoms, polyoxymethylene, monoalkylated 
polyoxymethylene, polyethylene imine. polyamide, polyester, alkylated 

25 silyl. hetero-oligo/polyaminoacid and polyethylene glycol (Figures 3 and 4). 
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Mass modifying functionalities may also be generated from 
a precursor functionality such as -N 3 or -XR, wherein X is: -OH, -NH 2 , - 
NHR, -SH, -NCS, -OCO(CH 2 ) n COOH, -NHCOCCH^COOH, -OS0 2 OH, 
-OCO(CH 2 ) n I or -OP(0-alkyl)-N-(alkyl) 2 , and n is an integer from 1 to 20; 
5 and R is: -H, deuterium and alkyls, alkoxys or aryls of 1-6 carbon atoms, 
such as methyl, ethyl, propyl, isopropyl, t-butyl, hexyl, benzyl, benzhydral, 
trityl, substituted trityl, aryl, substituted aryl, polyoxymethylene, 
monoalkylated polyoxymethylene, polyethylene imine, polyamide, 
polyester, alkylated silyl, heterooligo/polyaminoacid or polyethylene glycol. 
10 These and other mass modifying functionalities which do not interfere with 
hybridization can be attached to a nucleic acids either alone or in 
combination. Preferably, combinations of different mass modifications are 
utilized to maximize distinctions between nucleic acids having different 
sequences. 

15 Mass modifications may be major changes of molecular 

weight, such as occurs with coupling between a nucleic acid and a 
heterooligo/polyaminoacid, or more minor such as occurs by substituting 
chemical moieties into the nucleic acid having molecular masses smaller 
than the natural moiety. Non-essential chemical groups may be eliminated 

20 or modified using, for example, an alkylating agent such as iodoacetamide. 
Alkylation of nucleic acids with iodoacetamide has an additional advantage 
that a reactive oxygen of the 3'-position of the sugar is eliminated. This 
provides one less site per base for alkali cations, such as sodium, to interact. 
Sodium, present in nearly all nucleic acids, increases the likelihood of 

25 forming satellite adduct peaks upon ionization. Adduct peaks appear at a 
slightly greater mass than the true molecule which would greatly reduce the 
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accuracy of molecular weight determinations. These problems can be 
addressed, in part, with matrix selection in mass spectrometric analysis, but 
this only helps with nucleic acids of less than 20 nucleotides. Ammonium 
( + NH 3 ), which can substitute for the sodium cation ( + Na) during ion 
5 exchange, does not increase adduct formation. Consequently, another useful 
mass modification is to remove alkali cations from the entire nucleic acid. 
This can be accomplished by ion exchange with aqueous solutions of 
ammonium such as ammonium acetate, ammonium carbonate, diammonium 
hydrogen citrate, ammonium tartrate and combinations of these solutions. 
10 DNA dissolved in 3 M aqueous ammonium hydroxide neutralizes all the 
acidic functions of the molecule. As there are no protons, there is a 
significant reduction in fragmentation during procedures such as mass 
spectrometry. 

Another mass modification is to utilize nucleic acids with non- 
15 ionic polar phosphate backbones (e.g. PNA). Such nucleotides can be 
generated by oligonucleoside phosphomonothioate diesters or by enzymatic 
synthesis using nucleic acid polymerases and alpha- (a-) thio nucleoside 
triphosphate and subsequent alkylation with iodoacetamide. Synthesis of 
such compounds is straight forward and can be performed and the products 
20 separated and isolated by, for example, analytical HPLC. 

Mass modification of arrays can be performed before or after 
target hybridization as the modification do not interfere with hybridization 
of or hybridized nucleic. This conditioning of the array is simply to perform 
and easily adaptable in bulk. Probe arrays can therefore be synthesized with 
25 no special manipulations. Only after the arrays are fixed to solid supports. 
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just in fact when it would be most convenient to perform mass modification, 
would probes be conditioned. 

Probe strands may also be mass modified subsequent to 
synthesis by, for example, contacting by treating the extended strands with 
5 an alkylating agent, a thiolating agent or subjecting the nucleic acid to cation 
exchange. Nucleic acid which can be modified include target sequences, 
probe sequences and strands, extended strands of the probe and other 
available fragments. Probes can be mass modified on either strand prior to 
hybridization. Such arrays of mass modified or conditioned nucleic acids 

10 can be bound to fragments containing the target sequence with no 
interference to the fidelity of hybridization. Subsequent extension of either 
strand of the probe, for example using Sanger sequencing techniques, and 
using the target sequences as templates will create mass modified extended 
strands. The molecular weights of these strands can be determined with 

1 5 excellent accuracy. 

Probes may be in solution, such as in wells or on the surface 
of a micro-tray, or attached to a solid support. Mass modification can occur 
while the probes are fixed to the support, prior to fixation or upon cleavage 
from the support which can occur concurrently with ablation when analyzed 

20 by mass spectrometry- In this regard, it can be important which strand is 
released from the support upon laser ablation. Preferably, in such cases, the 
probe is differentially attached to the support. One strand may be permanent 
and the other temporarily attached or. at least, selectively releasable. 

Examples of solid supports which can be used include a 

25 plastic, a ceramic, a metal, a resin, a gel and a membrane. Useful types of 
solid supports include plates, beads, microbeads. whiskers, combs. 
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hybridization chips, membranes, single crystals, ceramics and self- 
assembling monolayers. A preferred embodiment comprises a two- 
dimensional or three-dimensional matrix, such as a gel or hybridization chip 
with multiple probe binding sites (Pevzner et al., J. Biomol. Struc. & Dyn. 
5 9:399-410, 1991 ;Maskos and Southern, Nuc. Acids Res. 20: 1679-84, 1992). 
Hybridization chips can be used to construct very large probe arrays which 
are subsequently hybridized with a target nucleic acid. Analysis of the 
hybridization pattern of the chip can assist in the identification of the target 
nucleotide sequence. Patterns can be manually or computer analyzed, but 

10 it is clear that positional sequencing by hybridization lends itself to 
computer analysis and automation. Algorithms and software have been 
developed for sequence reconstruction which are applicable to the methods 
described herein (R. Drmanac et al., J. Biomol. Struc. & Dyn. 5: 1085- 1 1 02, 
1991; P. A. Pevzner, J. Biomol. Struc. & Dyn. 7:63-73. 1989). 

15 Nucleic acid probes may be attached to the solid support by 

covalent binding such as by conjugation with a coupling agent or by, 
covalent or non-covalent binding such as electrostatic interactions, hydrogen 
bonds or antibody-antigen coupling, or by combinations thereof. Typical 
coupling agents include biotin/avidin, biotin/streptavidin. Staphylococcus 

20 aureus protein A/IgG antibody F c fragment, and streptavi din/protein A 
chimeras (T. Sano and C.R. Cantor, Bio/Technology 9:1378-81, 1991 ), or 
derivatives or combinations of these agents. Nucleic acids may be attached 
to the solid support by a photocleavable bond, an electrostatic bond, a 
disulfide bond, a peptide bond, a diester bond or a combination of these sorts 

25 of bonds. The array may also be attached to the solid support by a 
selectively releasable bond such as 4.4'-dimethoxytrityl or its derivative. 
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Derivatives which have been found to be useful include 3 or 4 [bis-(4- 
methoxyphenyl)]-methyl-benzoic acid, N-succinimidyl- 3 or 4 [bis-(4- 
methoxyphenyl)]-methyi-benzoic acid, N-succinimidyl- 3 or 4 [bis-(4- 
methoxyphenyl)]-hydroxymethyl-benzoic acid, N-succinimidyl- 3 or 4 [bis- 
5 (4-methoxyphenyl)]-chloromethyl-benzoic acid, and salts of these acids. 

Binding may be reversible or permanent where strong 
associations would be critical. In addition, probes may be attached to solid 
supports via spacer moieties between the probes of the array and the solid 
support. Useful spacers include a coupling agent, as described above for 

1 0 binding to other or additional coupling partners, or to render the attachment 
to the solid support cleavable. 

Cleavable attachments may be created by attaching cleavable 
chemical moieties between the probes and the solid support such as an 
oligopeptide, oligonucleotide, oligopolyamide, oligoacrylamide, 

1 5 oligoethylene glycerol, alkyl chains of between about 6 to 20 carbon atoms, 
and combinations thereof. These moieties may be cleaved with added 
chemical agents, electromagnetic radiation or enzymes. Examples of 
attachments cleavable by enzymes include peptide bonds which can be 
cleaved by proteases and phosphodiester bonds which can be cleaved by 

20 nucleases. Chemical agents such as P-mercaptoethanol, dithiothreitol (DTT) 
and other reducing agents cleave disulfide bonds. Other agents which may 
be useful include oxidizing agents, hydrating agents and other selectively 
active compounds. Electromagnetic radiation such as ultraviolet, infrared 
and visible light cleave photocleavable bonds. Attachments may also be 

25 reversible such as. for example, using heat or enzymatic treatment, or 
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reversible chemical or magnetic attachments. Release and reattachment can 
be performed using, for example, magnetic or electrical fields. 

Hybridized probes can provide direct or indirect information 
about the hybridized sequence. Direct information may be obtained from 
5 the binding pattern of the array wherein probe sequences are known or can 
be determined. Indirect information requires additional analysis of a 
plurality of nucleic acids of the target array. For example, a specific nucleic 
acid sequence will have a unique or relatively unique molecular weight 
depending on its size and composition. That molecular weight can be 

1 0 determined, for example, by chromatography (e.g. HPLC), nuclear magnetic 
resonance (NMR), high-definition gel electrophoresis, capillary 
electrophoresis (e.g. HPCE), spectroscopy or mass spectrometry. 
Preferably, molecular weights are determined by measuring the mass/charge 
ratio with mass spectrometry technology. 

15 Mass spectrometry of biopolymers such as nucleic acids can 

be performed using a variety of techniques (e.g. U.S. Patent Nos. 4,442,354; 
4,931,639; 5002,868; 5,1 30,538;5, 1 3 5,870; 5.174,962). Difficulties 
associated with volatization of high molecular weight molecules such as 
DNA and RNA have been overcome, at least in part, with advances in 

20 techniques, procedures and electronic design. Further, only small quantities 
of sample are needed for analysis, the typical sample being a mixture of 1 0 
or so fragments. Quantities which range from between about 0. 1 femtomole 
to about 1.0 nanomole. preferably between about 1.0 femtomole to about 
1000 femtomoles and more preferably between about 10 femtomoles to 

25 about 100 femtomoles are typically sufficient for analysis. These amounts 
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can be easily placed onto the individual positions of a suitable surface or 
attached to a support. 

Another of the important features of this invention is that it is 
unnecessary to volatize large lengths of nucleic acids to determine sequence 
5 information. Using the methods of the invention, segments of the nucleic 
acid target, discretely isolated into separate complexes on the target array, 
can be sequenced and those sequence segments collated making it 
unnecessary to have to volatize the entire strand at once. Techniques which 
can be used to volatize a nucleic acid fragment include fast atom 

10 bombardment, plasma desorption, matrix-assisted laser 
desorption/ionization, electrospray, photochemical release, electrical release, 
droplet release, resonance ionization and combinations of these techniques. 

In electrohydrodynamic ionization, thermospray, aerospray 
and electrospray, the nucleic acid is dissolved in a solvent and injected with 

1 5 the help of heat, air or electricity, directly into the ionization chamber. If the 
method of ionization involves a light beam, particle beam or electric 
discharge, the sample may be attached to a surface and introduced into the 
ionization chamber. In such situations, a plurality of samples may be 
attached to a single surface or multiple surfaces and introduced 

20 simultaneously into the ionization chamber and still analyzed individually. 
The appropriate sector of the surface which contains the desired nucleic acid 
can be moved to proximate the path an ionizing beam. After the beam is 
pulsed on and the surface bound molecules are ionized, a different sector of 
the surface is moved into the path of the beam and a second sample, with the 

25 same or different molecule, is analyzed without reloading the machine. 
Multiple samples may also be introduced at electrically isolated regions of 
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a surface. Different sectors of the chip are connected to an electrical source 
and ionized individually. The surface to which the sample is attached may 
be shaped for maximum efficiency of the ionization method used. For field 
ionization and field desorption, a pin or sharp edge is an efficient solid 
5 support and for particle bombardment and laser ionization, a flat surface. 

The goal of ionization for mass spectroscopy is to produce a 
whole molecule with a charge. Preferably, a matrix-assisted laser 
desorption/ionization (MALDI) or electrospray (ES) mass spectroscopy is 
used to determine molecular weight and. thus, sequence information from 

10 the target array. It will be recognized by those of ordinary skill that a 
variety of methods may be used which are appropriate for large molecules 
such as nucleic acids. Typically, a nucleic acid is dissolved in a solvent and 
injected into the ionization chamber using electrohydrodynamic ionization, 
thermospray, aerospray or electrospray. Nucleic acids may also be attached 

15 to a surface and ionized with a beam of particles or light. Particles which 
have successfully used include plasma (plasma desorption). ions (fast ion 
bombardment) or atoms (fast atom bombardment). Ions have also been 
produced with the rapid application of laser energy (laser desorption) and 
electrical energy (field desorption). 

20 In mass spectrometer analysis, the sample is ionized briefly by 

a pulse of laser beams or by an electric field induced spray. The ions are 
accelerated in an electric field and sent at a high velocity into the analyzer 
portion of the spectrometer. The speed of the accelerated ion is directly 
proportional to the charge (z) and inversely proportional to the mass (m) of 

25 the ion. The mass of the molecule may be deduced from the flight 
characteristics of its ion. For small ions, the typical detector has a magnetic 
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field which functions to constrain the ions stream into a circular path. The 
radii of the paths of equally charged particles in a uniform magnetic field is 
directly proportional to mass. That is, a heavier particle with the same 
charge as a lighter particle will have a larger flight radius in a magnetic 
5 field. It is generally considered to be impractical to measure the flight 
characteristics of large ions such as nucleic acids in a magnetic field because 
the relatively high mass to charge (m/z) ratio requires a magnet of unusual 
size or strength. To overcome this limitation the electrospray method, for 
example, can consistently place multiple ions on a molecule. Multiple 

10 charges on a nucleic acid will decrease the mass to charge ratio allowing a 
conventional quadrupole analyzer to detect species of up to 100,000 daltons. 

Nucleic acid ions generated by the matrix assisted laser 
desorption/ionization only have a unit charge and because of their large 
mass, generally require analysis by a time of flight analyzer. Time of flight 

15 analyzers are basically long tubes with a detector at one end. In the 
operation of a TOF analyzer, a sample is ionized briefly and accelerated 
down the tube. After detection, the time needed for travel down the detector 
tube is calculated. The mass of the ion may be calculated from the time of 
flight. TOF analyzers do not require a magnetic field and can detect unit 

20 charged ions with a mass of up to 100,000 daltons. For improved resolution, 
the time of flight mass spectrometer may include a reflectron. a region at the 
end of the flight tube which negatively accelerates ions. Moving particles 
entering the reflectron region, which contains a field of opposite polarity to 
the accelerating field, are retarded to zero speed and then reverse accelerated 

25 out with the same speed but in the opposite direction. In the use of an 
analyzer with a reflectron. the detector is placed on the same side of the 
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flight tube as the ion source to detect the returned ions and the effective 
length of the flight tube and the resolution power is effectively doubled. 
The calculation of mass to charge ratio from the time of flight data takes into 
account of the time spent in the reflectron. 
5 Ions with the same charge to mass ratio will typically leave the 

ion accelerators with a range of energies because the ionization regions of 
a mass spectrometer is not a point source. Ions generated further away from 
the flight tube, spend a longer time in the accelerator field and enter the 
flight tube at a higher speed. Thus ions of a single species of molecule will 

10 arrive at the detector at different times. In time of flight analysis, a longer 
time in the flight tube in theory provide more sensitivity, but due to the 
different speeds of the ions, the noise (background) will also be increased. 
A reflectron, besides effectively doubling the effective length of the flight 
tube, can reduce the error and increase sensitivity by reducing the spread of 

1 5 detector impingement time of a single species of ions. An ion with a higher 
velocity will enter the reflectron at a higher velocity and stay in the 
reflectron region longer than a lower velocity ion. If the reflectron electrode 
voltages are arranged appropriately, the peak width contribution from the 
initial velocity distribution can be largely corrected for at the plane of the 

20 detector. The correction provided by the reflectron leads to increased mass 
resolution for all stable ions, those which do not dissociate in flight, in the 
spectrum. 

While a linear field reflectron functions adequately to reduce 
noise and enhance sensitivity, reflectrons with more complex field strengths 
25 offer superior correctional abilities and a number of complex reflectrons can 
be used. The double stage reflectron has a first region with a weaker electric 
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field and a second region with a stronger electric field. The quadratic and 
the curve field reflectron have a electric field which increases as a function 
of the distance. These functions, as their name implies, may be a quadratic 
or a complex exponential function. The dual stage, quadratic, and curve 
5 field reflectrons, while more elaborate are also more accurate than the linear 
reflectron. 

The detection of ions in a mass spectrometer is typically 
performed using electron detectors. To be detected, the high mass ions 
produced by the mass spectrometer is converted into either electrons or low 
10 mass ions at a conversion electrode. These electrons or low mass ions are 
then used to start the electron multiplication cascade in an electron 
multiplier and further amplified with a fast linear amplifier. The signals 
from multiple analysis of a single sample are combined to improve the 
signal to noise ratio and the peak shapes, which also increase the accuracy 

1 5 of the mass determination. 

This invention is also directed to the detection of multiple 
primary ions directly through the use of ion cyclotron resonance and Fourier 
analysis. This is useful for the analysis of a complete sequencing ladder 
immobilized on a surface. In this method, a plurality of samples are ionized 

20 at once and the ions are captured in a cell with a high magnetic field. An RF 
field excites the population of ions into cyclotron orbits. Because the 
frequencies of the orbits are a function of mass, an output signal 
representing the spectrum of the ion masses is obtained. This output is 
analyzed by a computer using Fourier analysis which reduces the combined 

25 signal to its component frequencies and thus provides a measurement of the 
ion masses present in the ion sample. Ion cyclotron resonance and Fourier 
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analysis can determine the masses of all nucleic acids in a sample. The 
application of this method is especially useful on a sequencing ladder. 

The data from mass spectrometry, either performed singly or 
in parallel (multiplexed), can determine the molecular mass of a nucleic acid 
5 sample. The molecular mass, combined with the known sequence of the 
sample, can be analyzed to determine the length of the sample. Because 
different bases have different molecular weight, the output of a high 
resolution mass spectrometer, combined with the known sequence and 
reaction history of the sample, will determine the sequence and length of the 

10 nucleic acid analyzed. In the mass spectroscopy of a sequencing ladder, 
generally the base sequence of the primers are known. From a known 
sequence of a certain length, the added base of a sequence one base longer 
can be deduced by a comparison of the mass of the two molecules. This 
process is continued until the complete sequence of a sequencing ladder is 

15 determined. 

Another embodiment of the invention is directed to a method 
for detecting a target nucleic acid. As before, a set of nucleic acids 
complementary or homologous to a sequence of the target is hybridized to 
an array of nucleic acid probes. The molecular weights of the hybridized 

20 nucleic acids determined by, for example, mass spectrometry and the nucleic 
acid target detected by the presence of its sequence in the sample. As the 
object is not to obtain extensive sequence information, probe arrays may be 
fairly small with the critical sequences, the sequences to be detected, 
repeated in as many variations as possible. Variations may have greater than 

25 95% homology to the sequence of interest, greater than 80%, greater than 
70% or greater than about 60%. Variations may also have additional 
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sequences not required or present in the target sequence to increase or 
decrease the degree of hybridization. Sensitivity of the array to the target 
sequence is increased while reducing and hopefully eliminating the number 
of false positives. 

5 Target nucleic acids to be detected may be obtained from a 

biological sample, an archival sample, an environmental sample or another 
source expected to contain the target sequence. For example, samples may 
be obtained from biopsies of a patient and the presence of the target 
sequence is indicative of the disease or disorder such as, for example, a 

10 neoplasm or an infection. Samples may also be obtained from 
environmental sources such as bodies of water, soil or waste sites to detect 
the presence and possibly identify organisms and microorganism which may 
be present in the sample. The presence of particular microorganisms in the 
sample may be indicative of a dangerous pathogen or that the normal flora 

1 5 is present. 

Another embodiment of the invention is directed to the arrays 
of nucleic acid probes useful in the above-described methods and 
procedures. These probes comprise a first strand and a second strand 
wherein the first strand is hybridized to the second strand forming a doubie- 

20 stranded portion, a single-stranded portion and a variable sequence within 
the single-stranded portion. The array may be attached to a solid support 
such as a material that facilitates volatization of nucleic acids for mass 
spectrometry. Typically, arrays comprise large numbers of probes such as 
less than or equal to about 4 R different probes and R is the length in 

25 nucleotides of the variable sequence. When utilizing arrays for large scale 
sequencing, larger arrays can be used whereas, arrays which are used for 
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detection of specific sequences may be fairly small as many of the potential 
sequence combinations will not be necessary. 

Arrays may also comprise nucleic acid probes which are 
entirely single-stranded and nucleic acids which are single-stranded, but 
5 possess hairpin loops which create double-stranded regions. Such structures 
can function in a manner similar if not identical to the partially single- 
Stranded probes, which comprise two strands of nucleic acid, and have the 
additional advantage of thermodynamic energy available in the secondary 
structure. 

10 Arrays may be in solution or fixed on a solid support through 

streptavidin-biotin interactions or other suitable coupling agents. Arrays 
may also be reversibly fixed to the solid support using, for example, 
chemical moieties which can be cleaved with electromagnetic radiation, 
chemical agents and the like. The solid support may comprise materials 

15 such as matrix chemicals which assist in the volatization process for mass 
spectrometric analysis. Such chemicals include nicotinic acid, 3'- 
hydroxypicolnic acid, 2.5-dihydroxybenzoic acid, sinapinic acid, succinic 
acid, glycerol, urea and Tris-HCl, pH about 7.3. 

Another embodiment of the invention is directed to 

20 sequencing double-stranded nucleic acids using strand-displacement 
polymerization. With this method it is unnecessary to denature the double- 
strands to obtain sequence information. Strand-displacement polymerization 
creates a new strand while simultaneously displacing the existing strand. 
Techniques for incorporating label into the growing strand are well-know 

25 and the newly polymerized strand is easily detected by, for example, mass 
spectrometry. 
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Target nucleic acid or nucleic acids containing sequences that 
correspond to the sequence of the target are digested, for example, with 
restriction enzymes, in one or more steps to create a set of fragments which 
are partially single-stranded and partially double-stranded. Another set of 
5 nucleic acids, the probes, are also partially single-stranded and partially 
double-stranded. These probes preferably contain a variable or constant 
regions within the single-stranded portion of the terminus of each fragment 
(5'- or 3 -overhangs). Probes or fragments are treated with a phosphatase to 
remove phosphate groups from the 5-termini of the nucleic acids. 
10 Phosphatase treatment prevents nucleic acid ligation by ligase which 
requires a terminal 5'-phosphate to covalently link to a 3'-hydroxyl. Single- 
stranded regions of the fragments are hybridized to single-stranded regions 
of the probes forming an array of hybridized target/probe complexes. 
Adjacent or abutting nucleic acid strands of the complex are ligated, 
15 covalently joining a strand of the fragment to a strand of the probe. 
Phosphatase treatment prevents both self-ligation of phosphatase-treated 
nucleic acids and ligation between the 5'-termtni of phosphatased nucleic 
acids and the 3'-termini of untreated nucleic acids. These complexes are 
treated with a nucleic acid polymerase that recognizes and bind to the nick 
20 in the unligated strand to initiate polymerization. The polymerase 
synthesizes a new strand using the ligated stand as a template, while 
displacing the complementary strand. The reaction may be supplemented 
with labeled or mass modified nucleotides (e.g. mass modifications at 
positions C2. K3. N7 or C8 of purine, or at N7 or N9 of deazapurine) or 
25 other detectable markers that will allow for the detection of new synthesis. 
Either the probes or the fragments may be fixed to a solid support such as 
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a plastic or glass surface, membrane or structure (magnetic bead) which 
eliminates the need for repetitive extractions or other purification of nucleic 
acids between steps. 

Preferably, double-stranded nucleic acids containing target 
5 sequences are obtained by polymerase chain reaction or enzymatic digestion 
(e.g. restriction enzymes) of the target sequence. Target sequences may be 
DNA, RNA, RNA/DNA hybrids, cDNA, PNA or modifications or 
combinations thereof and are preferably from about 10 to about 1,000 
nucleotides in length, more preferably, from about 20 to about 500 

10 nucleotides in length, and even more preferably, from about 35 to about 250 
nucleotides in length. 5'-termini of the nucleic acid fragments or probes may 
be dephosphorylated with a phosphatase, such as alkaline or calf intestinal 
phosphatase, which eliminates the action of a nucleic acid ligase. Upon 
hybridization of fragment to probe, only one of the two internal 5'-3' 

15 junctions contains a 5'-phosphate and is capable of ligation. The second 
junction appears as a nick in a strand of the complex. Nucleic acid 
polymerases, such as Klenow, recognize the nick and synthesize a new 
strand while displacing the complementary, ligated strand. Chain elongation 
can proceed in the presence of, for example, nucleotide triphosphates and 

20 chain terminating nucleotides. Nucleic acid synthesis terminates when a 
dideoxynucleotide is incorporated into the elongating strand. The resulting 
fragments represent a nested set of the sequence of the target. Precursor 
nucleotides may be labeled with, for example, mass modifications. The 
mass modified fragments can be easily analyzed by mass spectrometry to 

25 determine the sequence of the target. Complexes may further comprise 
single-stranded binding protein (SSB; E coli) which increases stability of 
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the complex and facilitate polymerase action. Bands otherwise obscured are 
more easily detected. SSB can be used to sequence fragments of greater 
than 100 nucleotides, preferably greater than 150 nucleotides and more 
preferably greater than 200 nucleotides. 
5 This method is generally useful for manual or automated 

nucleic acid sequencing, and especially useful for identifying and 
sequencing a single or group of nucleic acid species in a mixed background 
containing a plurality of species of different sequences. In this method, 
selection is performed upon hybridization and ligation of fragments to 

10 probes. Probes may be designed to contain a common or variable sequence 
within the single-stranded region that is complementary to a sequence of the 
fragment to be identified and, if desired, sequenced. Stringency of 
fragment/probe hybridization can be adjusted by methods well-known to 
those of ordinary skill to match desired conditions of selection. For 

1 5 example, the single-stranded region of the probe can be designed to contain 
a specific sequence only found on the single-stranded region of the nucleic 
acid fragment of interest. Alternatively, multiple probes containing multiple 
variable regions may be used to select for those fragment sequences which 
may be longer than the length of the single-stranded region of any one 

20 probe. Hybridization and ligation selects the specific fragment from a 
complex mixture of different fragments and only that specific fragment is 
subsequently sequenced. 

Probes are typically from about 1 5 to about 200 nucleotides 
in length, but can be larger or smalt depending on the particular application. 

25 Single-stranded regions of the probes may be about 3, 4, 5, 6. 7, 8. 9. 10, 12, 
15. 20, 22, 25 or 30 nucleotides in length or larger. For probes containing 



WO 96/32504 



PCTYUS96/05136 



48 

a variable region within the single-stranded region, the length of this 
variable region may be the same or smaller than the length of the entire 
single-stranded portion. Variable regions may be distinct between probes 
or common within sets of probes. The double-stranded region of the probe 
5 is typically larger than the single-stranded region and may be about 4, 5, 6, 
7, 8, 9, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 35 40 or 50 nucleotides in 
length or larger. Probes may also be modified to facilitate attachment to a 
solid support or other surfaces, or modified to be individual detectable for 
identification or other purposes. Sets of nucleic acids, either fragments or 

10 probes, preferably contain greater than 10 2 , 10 3 , 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , 10 9 
or 10'° different members. 

Another embodiment of the invention is directed to kits for 
detecting a sequence of a target nucleic acid. An array of nucleic acid 
probes is fixed to a solid support which may be coated with a matrix 

1 5 chemical that facilitates volatization of nucleic acids for mass spectrometry. 
Kits can be used to detect diseases and disorders in biological samples by 
detecting specific nucleic acid sequences which are indicative of the 
disorder. Probes may be labeled with detectable labels which only become 
detectable upon hybridization with a correctly matched target sequence. 

20 Detectable labels include radioisotopes, metals, luminescent or 
bioluminescent chemicals, fluorescent chemicals, enzymes and 
combinations thereof. 

Another embodiment of the invention is directed to nucleic 
acid sequencing systems which comprise a mass spectrometer, a computer 

25 loaded with appropriate software for analysis of nucleic acids and an array 
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of probes which can be used to capture a target nucleic acid sequence. 
Systems may be manual or automated as desired. 



embodiments of the invention, and should not be viewed as limiting the 
5 scope of the invention. 
Examples 

Example 1 Preparation of Target N ucleic Acid . 



cleavage of cosmid DNA. The properties of type II and other restriction 
10 nucleases that cleave outside of their recognition sequences were exploited. 
A restriction digestion of a 10 to 50 kb DNA sample with such an enzyme 
produced a mixture of DNA fragments most of which have unique ends. 
Recognition and cleavage sites of useful enzymes are shown in Table 1 . 



The following experiments are offered to illustrate 



Target nucleic acid is prepared by restriction endonuclease 



Table 1 



15 



Restriction Enzymes and Recognition Sites for PSBH 



Mwo I 



GCNNNNN-NNGC 
CGNN-NNNNNCG 



20 



Esi Yl 



CCNNNNN-NNGG 
GGNN-NNNNNCC 



25 



Apa BI 



GCANNNNN-TGC 
CGT-NNNNNACG 



AM I 



CCTCN 7 
GGAGN 6 



30 
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TspRI 



NNCAGTGNN 
NNGTCACNN 



5 



QeJ 



CCANNNNNN-GTNNNN 
GGTNNNNNN-CANNNN 



10 



Cje PI 



CCANNNNN-NNTCNN 
GGTNNNNN-NNAGNN 



One restriction enzyme, ApaB 15, with a 6 base pair 



recognition site may also be used. DNA sequencing is best served by 
1 5 enzymes that produce average fragment lengths comparable to the lengths 
of DNA sequencing ladders analyzable by mass spectrometry. At present 
these lengths are about 100 bases or less. 



to digest DNA in preparation of PSBH. Target DNA from is cleaved to 
20 completion and complexed with PSBH probes either before or after melting. 
The fraction of fragments with unique ends or degenerate ends depends on 
the complexity of the target sequence. For example, a 10 kilobase clone 
would yield on average 16 fragments or a total of 32 ends since each double- 
stranded DNA target produces two ligatable 3' ends. With 1024 possible 
25 ends, Poisson statistics (Table 2) predict that there would be 3% 
degeneracies. In contrast, a 40 kilobase cosmid insert would yield 64 
fragments or 128 ends, of which. 12% of these would be degenerate and a 
50 kilobase sample would yield 80 fragments or 160 ends. Some of these 
would surely be degenerate. Up to at least 100 kilobase. the larger the target 
30 the more sequence are available from each multiplex DNA sample 



BsiY\ and Mwo I restriction endonucleases are used together 
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preparation. With a 100 kilobase target, 27% of the targets would be 
degenerate. 

Table 2 

Poisson Distribution of Restriction Enzyme Sites 

5 Target size Mwq I XsvRl 

(kb) Sequencing Assembly Sequencing Assembly 

10 0.97 0.60 0.94 0.94 

40 0.88 0.14 0.80 0.80 

100 0.73 0.01 0.57 0.57 

10 

With BsiYl and Mwo I, any restriction site that yields a unique 
5 base end may be captured twice and the resulting sequence data obtained 
will read away from the site in both directions (Figure 5). With the 
knowledge of three bases of overlapping sequence at the site, this sorts all 

15 sequences into 64 different categories. With 10 kilobase targets, 60% will 
contain fragments and, thus sequence assembly is automatic. 

Two array capture methods can be used with Mwo I and BsiY 
I. In the first method, conventional five base capture is used. Because the 
two target bases adjacent to the capture site are known, they from the 

20 restriction enzyme recognition sequence, an alternative capture strategy 
would build the complement of these two bases into the capture sequence. 
Seven base capture is thermodynamically more stable, but less 
discriminating against mismatches. 

TspR I is another commercially available restriction enzyme 

25 with properties that are very attractive for use in PSBH-mediated Sanger 
sequencing. The method for using TspR 1 is shown in Figure 6. TspR I has 
a five base recognition site and cuts two bases outside this site on each 
strand to yield nine base 3' single-stranded overhangs. These can be 
captured with partially duplex probes with complementary nine base 
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overhangs. Because only four bases are not specified by enzyme 
recognition, TspR I digest results in only 256 types of cleavage sites. With 
human DNA the average fragment length that results is 1370 bases. This 
enzyme is ideal to generate long Sequence ladders and are useful to input to 
5 long thin gel sequencing where reads up to a kilobase are common. A 
typical human cosmid yields about 30 TspR I fragments or 60 ends. Given 
the length distribution expected, many of these could not be sequenced fully 
from one end. With 256 possible overhangs, Poisson statistics (Table 2) 
indicate that 80% adjacent fragments can be assembled with no additional 

1 0 labor. Thus, very long blocks of continuous DNA sequence are produced. 

Three additional restriction enzymes are also useful. These 
are Mnl I, Cje I and CjeP I (Table 1 ). The first has a four base site with one 
A+T should give smaller human DNA fragments on average than Mwo I or 
BsiYl. The latter two have unusual interrupted five base recognition sites 

1 5 and might supplement TspR I. 

Target DNA may also be prepared by tagged PCR. It is 
possible to add a preselected five base 3' terminal sequence to a target DNA 
using a PCR primer five bases longer than the known target sequence 
priming site. Samples made in this way can be captured and sequenced 

20 using the PSBH approach based on the five base tag. A biotin was used to 
allow purification of the complementary strand prior to use as an 
immobilized sequencing template. A biotin may also be placed on the tag. 
After capture of the duplex PCR product by streptavidin-coated magnetic 
microbeads, the desired strand (needed to serve as a sequencing template) 

25 could be denatured from the duplex and used to contact the entire probe 
array. For multiplex sample preparation, a series of different five base 



WO 96^32504 



PCT/US96/05136 



53 

tagged primers would be employed, ideally in a single multiplex PCR 
reaction This approach also requires knowing enough target sequence for 
unique PCR amplification and is more useftil for shotgun sequencing or 
comparative sequencing than for de novo sequencing. 
5 Example 2 Basic Aspects of Positional Sequencing bv Hybridization. 

An examination of the potential advantages of stacking 
hybridization has been carried out by both calculations and pilot 
experiments. Some calculated T m 's for perfect and mismatched duplexes are 
shown in Figure 7. These are based on average base compositions. The 

10 calculations revealed that the binding of a second oligomer next to a pre- 
formed duplex provides an extra stability equal to about two base pairs and 
that mis-pairing seems to have a larger consequence on stacking 
hybridization than it does on ordinary hybridization. Other types of mis- 
pairing are less destabilizing, but these can be eliminated by requiring a 

15 ligation step. In standard SBH, a terminal mismatch is the least 
destabilizing event, and leads to the greatest source of ambiguity or 
background. For an octanucleotide complex, an average terminal mismatch 
leads to a 6°C lowering in T m . For stacking hybridization, a terminal 
mismatch on the side away from the pre-existing duplex, is the least 

20 destabilizing event. For a pentamer, this leads to a drop in T m of 10°C. 
These considerations indicate that the discrimination power of stacking 
hybridization in favor of perfect duplexes are greater than ordinary SBH. 
Example 3 Preparation ° f Mod el Arra ys. 

In a single synthesis, all 1024 possible single-stranded probes 

25 with a constant 18 base stalk followed by a variable 5 base extension can be 
created. The 18 base extension is designed to contain two restriction 
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enzyme cutting sites. Hga I generates a 5 base, 5' overhang consisting of the 
variable bases N 5 . Not I generates a 4 base, 5' overhang at the constant end 
of the oligonucleotide. The synthetic 23-mer mixture hybridized with a 
complementary 18-mer forms a duplex which can be enzymatically 
5 extended to form all 1024, 23-mer duplexes. These are cloned by, for 
example, blunt end ligation, into a plasmid which lacks Not I sites. Colonies 
containing the cloned 23-base insert are selected and each clone contains 
one unique sequence. DNA minipreps can be cut at the constant end of the 
stalk, filled in with biotinylated pyrimidines and cut at the variable end of 

10 the stalk to generate the 5 base 5' overhang. The resulting nucleic acid is 
fractionated by Qiagen columns (nucleic acid purification columns) to 
discard the high molecular weight material. The nucleic acid probe will then 
be attached to a streptavidin-coated surface. This procedure could easily be 
automated in a Beckman Biomec or equivalent chemical robot to produce 

15 many identical arrays of probes. 

The initial array contains about a thousand probes. The 
particular sequence at any location in the array will not be known. 
However, the array can be used for statistical evaluation of the signal to 
noise ratio and the sequence discrimination for different target molecules 

20 under different hybridization conditions. Hybridization with known nucleic 
acid sequences allows for the identification of particular elements of the 
array. A sufficient set of hybridizations would train the array for any 
subsequent sequencing task. Arrays are partially characterized until they 
have the desired properties. For example, the length of the oligonucleotide 

25 duplex, the mode of its attachment to a surface and the hybridization 
conditions used can all be varied using the initial set of cloned DNA probes. 
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Once the sort of array that works best is determined, a complete and fully 
characterized array can be constructed by ordinary chemical synthesis. 
Example 4 Preparation of Specific Probe Arrays. 

With positional SBH, one potential trick to compensate for 
5 some variations in stability among species due to GC content variation is to 
provide GC rich stacking duplex adjacent AT rich overhangs and AT rich 
stacking duplex adjacent GC rich overhangs. Moderately dense arrays can 
be made using a typical x-y robot to spot the biotinylated compounds 
individually onto a streptavidin-coated surface. Using such robots, it is 

10 possible to make arrays of 2 x 10 4 samples in 100 to 400 cm 2 of nominal 
surface. Commercially available streptavidin-coated beads can be adhered, 
permanently to plastics like polystyrene, by exposing the plastic first to a 
brief treatment with an organic solvent like triethylamine. The resulting 
plastic surfaces have enormously high biotin binding capacity because of the 

1 5 very high surface area that results. 

In certain experiments, the need for attaching oligonucleotides 
to surfaces may be circumvented altogether, and oligonucleotides attached 
to streptavidin-coated magnetic microbeads used as already done in pilot 
experiments. The beads can be manipulated in microtiter plates. A 

20 magnetic separator suitable for such plates can be used including the newly 
available compressed plates. For example, the 18 by 24 well plates 
(Genetix. Ltd.; USA Scientific Plastics) would allow containment of the 
entire array in 3 plates. This format is well handled by existing chemical 
robots. It is preferable to use the more compressed 36 by 48 well format so 

25 the entire array would fit on a single plate. The advantages of this approach 
for all the experiments are that any potential complexities from surface 
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effects can be avoided and already-existing liquid handling, thermal control 
and imaging methods can be used for all the experiments. 

Lastly, a rapid and highly efficient method to print arrays has 
been developed. Master arrays are made which direct the preparation of 
5 replicas or appropriate complementary arrays. A master array is made 
manually (or by a very accurate robot) by sampling a set of custom DNA 
sequences in the desired pattern and then transferring these sequences to the 
replica. The master array is just a set of all 1024-4096 compounds printed 
by multiple headed pipettes and compressed by offsetting. A potentially 

10 more elegant approach is shown in Figure 8. A master array is made and 
used to transfer components of the replicas in a sequence-specific way. The 
sequences to be transferred are designed to contain the desired 5 or 6 base 
5' variable overhang adjacent to a unique 15 base DNA sequence. 

The master array consists of a set of streptavidin bead- 

15 impregnated plastic coated metal pins. Immobilized biotinylated DNA 
strands that consist of the variable 5 or 6 base segment plus the constant 1 5 
base segment are at each tip. Any unoccupied sites on this surface are filled 
with excess free biotin. To produce a replica chip, the master array is 
incubated with the complement of the 1 5 base constant sequence. 5'-labeied 

20 with biotin. Next, DNA polymerase is used to synthesize the complement 
of the 5 or 6 base variable sequence. Then the wet pin array is touched to 
the streptavidin-coated surface of the replica and held at a temperature above 
the T m of the complexes on the master array. If there is insufficient liquid 
carryover from the pin array for efficient sample transfer, the replica arra_\ 

25 could first be coated with spaced droplets of solvent, either held in concave 
cavities or delivered by a multi-head pipenor. After the transfer, the replica 
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chip is incubated with the complement of 15 base constant sequence to 
reform the double-stranded portions of the array. The basic advantage of 
this scheme is that the master array and transfer compounds are made only 
once and the manufacture of replica arrays can proceed almost endlessly. 
5 Example 5 Attachment of Nucleic Acids Probes to Solid Supp orts. 

Nucleic acids may be attached to silicon wafers or to beads. 
A silicone solid support was derivatized to provide iodoacetyl functionalities 
on its surface. Derivatized solid support were bound to disulfide containing 
oligodeoxynucleotides. Alternatively, the solid support may be coated with 

1 0 streptavidin or avidin and bound to biotinylated DNA. 

Covalent attachment of oligonucleotide to derivatized chips: 
Silicon wafers are chips with an approximate weight of 50 mg. To maintain 
uniform reaction condition, it was necessary to determine the exact weight 
of each chip and select chips of similar weights for each experiment. The 

1 5 reaction scheme for this procedure is shown in Figure 9. 

To derivatize the chip to contain the iodoacetyl functionality 
an anhydrous solution of 25% (by volume) 3-aminopropyltrieshoxysilane 
in toluene was prepared under argon and aliquotted (700 ul) into tubes. A 
50 mg chip requires approximately 700 ul of silane solution. Each chip was 

20 flamed to remove any surface contaminants during its manufacture and 
dropped into the silane solution. The tube containing the chip was placed 
under an argon environment and shaken for approximately three hours. 
After this time, the silane solution was removed and the chips were washed 
three times with toluene and three times with dimethyl sulfoxide (DMSO). 

25 A 10 mM solution of N-succinimidyl(4-iodoacetyl)aminobenzoate (SIAB) 
(Pierce Chemical Co.; Rockford. IL) was prepared in anhydrous DMSO and 
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added to the tube containing a chip. Tubes were shaken under an argon 
environment for 20 minutes. The SLAB solution was removed and after 
three washes with DMSO, the chip was ready for attachment to 
oligonucleotides. 

5 Some oligonucleotides were labeled so the efficiency of 

attachment could monitored. Both 5' disulfide containing 
oligodeoxynucleotides and unmodified oligodeoxynucleotides were 
radiolabeled using terminal deoxynucleotidyl transferase enzyme and 
standard techniques. In a typical reaction, 0.5 mM of disulfide-containing 

1 0 oligodeoxynucleotide mix was added to a trace amount of the same species 
that had been radiolabeled as described above. This mixture was incubated 
with dithiothreitol (DTT) (6.2 umol, 100 mM) and 
ethylenediaminetetraacetic acid (EDTA) pH 8.0 (3 umol, 50 mM). EDTA 
served to chelate any cobalt that remained from the radiolabeling reaction 

1 5 that would complicate the cleavage reaction. The reaction was allowed to 
proceed for 5 hours at 37°C. With the cleavage reaction essentially 
complete, the free thiol-containing oligodeoxynucleotide was isolated using 
a Chromaspin-10 column. 

Similarly, Tris-(2-carboxyethyl)phosphine (TCEP) (Pierce 

20 Chemical Co.; Rockford, IL) has been used to cleave the disulfide. 
Conditions utilize TCEP at a concentration of approximately 100 mM in pH 
4.5 buffer. It is not necessary to isolate the product following the reaction 
since TCEP does not competitively react with the iodoacetyl functionality. 

To each chip which had been derivatized to contain the 

25 iodoacetyl functionality was added to a 10 uM solution of the 
oligodeoxynucleotide at pH 8. The reaction was allowed to proceed 
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overnight at room temperature. In this manner, two different 
oligodeoxynucleotides have been examined for their ability to bind to the 
iodoacetyl silicon wafer. The first was the free thiol containing 
oligodeoxynucleotide already described. In parallel with the free thiol 
5 containing oligodeoxynucleotide reaction, a negative control reaction has 
been performed that employs a 5' unmodified oligodeoxynucleotide. This 
species has similarly been 3' radiolabeled, but due to the unmodified 5' 
terminus, the non-covalent, non-specific interactions may be determined. 
Following the reaction, the radiolabeled oligodeoxynucleotides were 
10 removed and the chips were washed 3 times with water and quantitation 
proceeded. 

To determine the efficiency of attachment, chips of the wafer 

were exposed to a phosphorimager screen (Molecular Dynamics). This 

exposure usually proceeded overnight, but occasionally for longer periods 
15 of time depending on the amount of radioactivity incorporated. For each 

different oligodeoxynucleotide utilized, reference spots were made on 

polystyrene in which the molar amount of oligodeoxynucleotide was known. 

These reference spots were also exposed to the phosphorimager screen. 

Upon scanning the screen, the quantity (in moles) of oligodeoxynucleotide 
20 bound to each chip was determined by comparing the counts to the specific 

activities of the references. Using the weight of each chip, it is possible to 

calculate the area of the chip: 

(g of chip) (1130 mnv/g) - x mm* 

By incorporating this value, the amount of oligodeoxynucleotide bound to 
25 each chip may be reported in fmol/mm 2 . It is necessary to divide this value 

by two since a radioactive signal of 12 P is strong enough to be read through 
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the silicon wafer. Thus the instrument is essentially recording the 
radioactivity from both sides of the chip. 

Following the initial quantitation each chip was washed in 5 
x SSC buffer (75 mM sodium citrate, 750 mM sodium chloride, pH 7) with 
5 50% formamide at 65 °C for 5 hours. Each chip was washed three times 
with warm water, the 5 x SSC wash was repeated, and the chips 
requantitated. Disulfide linked oligonucleotides were removed from the 
chip by incubation with 100 mM DTT at 37°C for 5 hours. 
Example 6 Attachment of Nucleic Acids to Streptavidin Coated Solid 
10 Support. 

Immobilized single-stranded DNA targets for solid-phase 
DNA sequencing were prepared by PCR amplification. PCR was performed 
on a Perkin Elmer Cetus DNA Thermal Cycler using Vent R (exo") DNA 
polymerase (New England Biolabs; Beverly. MA), and dNTP solutions 

15 (Promega; Madison, WI). EcoR I digested plasmid NB34 (a PCR™ II 
plasmid with a one kb target anonymous human DNA insert) was used as 
the DNA template for amplification. PCR was performed with an 1 8- 
nucleotide upstream primer and a downstream 5'-end biotinylated 18- 
nucleotide primer. PCR amplification was carried out in a 100 ul or 400 jjI 

20 volume containing 10 mM KCI, 20 mM Tris-HC! (pH 8.8 at 25 D C), 10 mM 
(NH 4 ),S0 4 , 2 mM MgS0 4 , 0.1% Triton X-100, 250 uM dNTPs, 2.5 uM 
biotinylated primer, 5 uM non-biotinylated primer, less than 100 ng of 
plasmid DNA, and 6 units of Vent (exo ) DNA polymerase per 100 ul of 
reaction volume. Thirty temperature cycles were performed which included 

25 a heat denaturation step at 94 °C for 3 minute, followed by annealing of 
primers to the template DNA for 1 minute at 60°C. and DNA chain 
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extension with Vent (exo~) polymerase for 1 minute at 72 °C. For 
amplification with the tagged primer, 45 °C was selected for primer 
annealing. The PCR product was purified through a Ultrafree-MC 30,000 
NMWL filter unit (Millipore; Bedford, MA) or by electrophoresis and 
5 extraction from a low melting agarose gel. About 10 pmol of purified PCR 
fragment was mixed with 1 mg of prewashed magnetic beads coated with 
streptavidin (Dynabeads M280, Dynal, Norway) in 100 ul of 1 M NaCl and 
TE incubating at 37°C or 45°C for 30 minutes. 

The magnetic beads were used directly for double stranded 

10 sequencing. For single stranded sequencing, the immobilized biotinylated 
double-stranded DNA fragment was converted to single-stranded form by 
treating with freshly prepared 0.1 M NaOH at room temperature for 5 
minutes. The magnetic beads, with immobilized single-stranded DNA. were 
washed with 0. 1 M NaOH and TE before use. 

15 Example 7 Hybridizatio n S peci fi city. 

Hybridization was performed using probes with five and six 
base pair overhangs, including a five base pair match, a five base pair 
mismatch, a six base pair match, and a six base pair mismatch. These 
sequences are depicted in Table 3. 
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Table 3 
Hybridized Test Sequences 

Test Sequences: 

5 bp overlap, perfect match: 
5 3'-TCG AGA ACC TTG GCT*-5' (SEQ ID NO 1 ) 

3'-CTA CTA GGC TGC GTA GTC (SEQ ID NO 2) 

5'-biotin-GAT GAT CCG ACG CAT CAG AGC TC-3' (S EQ ID NO 3 ) 

5 bp overlap, mismatch at 3' end: 

3'-TCG AGA ACC TTG GCT*-5' (SEQ ID NO 1 ) 

1 0 3 -CTA CTA GGC TGC GTA GTC (SEQ ID NO 2) 

5'-biotin-GAT GAT CCG ACG CAT CAG AGC TT.-3' (SEQ ID NO 4) 

6 bp overlap, perfect match: 

3'-TCG AGA ACC TTG GCT*-5* (SEQ ID NO 1 ) 

3'-CTA CTA GGC TGC GTA GTC (SEQ ID NO 2) 

15 5'-biotin-GAT GAT CCG ACG CAT CAG AGC TCT-3' (SEQ ID NO 5) 

6 bp overlap, mismatch four bases from 3' end: 

3'-TCG AGA ACC TTG GCT*-5' (SEQ ID NO 1 ) 

3'-CTA CTA GGC TGC GTA GTC (SEQ ID NO 2) 

S'-biotin-GAT GAT CCG ACG CAT CAG AGI TCT-3' (SEQ ID NO 6) 

20 

The biotinylated double-stranded probe was prepared in TE 
buffer by annealing the complimentary single strands together at 68 °C for 
five minutes followed by slow cooling to room temperature. A five-fold 

25 excess of monodisperse, polystyrene-coated magnetic beads (Dynal) coated 
with streptavidin was added to the double-stranded probe, which as then 
incubated with agitation at room temperature for 30 minutes. After ligation, 
the samples were subjected to two cold (4°C) washes followed by one hot 
(90 °C) wash in TE buffer (Figure 10). The ratio of 3: P in the hot 

30 supernatant to the total amount of "P was determined (Figure 11). At high 
NaCl concentrations, mismatched target sequences were either not annealed 
or were removed in the cold washes. Under the same conditions, the 
matched target sequences were annealed and ligated to the probe. The final 
hot wash removed the non-biotinylated probe oligonucleotide. This 
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oligonucleotide contained the labeled target if the target had been ligated to 
the probe. 

Example 8 Compensating for Varia tions in Base Composition. 

The Dependence on T M on base composition, and on base 
5 sequence may be overcome with the use of salts like tetramethyl ammonium 
halides or betaines. Alternatively, base analogs like 2,6-diamino purine and 
5-bromo U can be used instead of A and T, respectively, to increase the 
stability of A-T base pairs, and derivatives like 7-deazaG can be us,-d to 
decrease the stability of G-C base pairs. The initial Experiments shown in 

10 Table 2 indicate that the use of enzymes will eliminate many of the 
complications due to base sequences. This gives the approach a very 
significant advantage over non-enzymatic methods which require different 
conditions for each nucleic acid and are highly matched to GC content. 

Another approach to compensate for differences in stability is 

1 5 to vary the base next to the stacking site. Experiments were performed to 
test the relative effects of all four bases in this position on overall 
hybridization discrimination and also on relative ligation discrimination 
other base analogs such as dU (deoxyuridine) and 7-deazaG may also be 
useful to suppress effects of secondary structure. 

20 Example 9 DNA Ligation to Oligonucleotide Arrays . 

E. coii and T4 DNA ligases can be used to covalently attach 
hybridized target nucleic acid to the correct immobilized oligonucleotide 
probe. This is a highly accurate and efficient process. Because ligase 
absolutely requires a correctly base paired 3' terminus, ligase will read only 

25 the 3'-terminal sequence of the target nucleic acid. After ligation, the 
resulting duplex will be 23 base pairs long and it will be possible to remove 
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unhybridized, unligated target nucleic acid using fairly stringent washing 
conditions. Appropriately chosen positive and negative controls 
demonstrate the specificity of this method, such as arrays which are lacking 
a 5'-terminal phosphate adjacent to the 3' overhang since these probes will 
5 not ligate to the target nucleic acid. 

There are a number of advantages to a ligation step. Physical 
specificity is supplanted by enzymatic specificity. Focusing on the 3' end 
of the target nucleic also minimize problems arising from stable secondary 
structures in the target DNA. DNA ligases are also used to covalently attach 

10 hybridized target DNA to the correct immobilized oligonucleotide probe. 
Several tests of the feasibility of the ligation method shown in Figure 1 2. 
Biotinylated probes were attached at 5' ends (Figure 12A) or 3' ends (Figure 
12B) to streptavidin-coated magnetic microbeads, and annealed with a 
shorter, complementary, constant sequence to produce duplexes with 5 or 

1 5 6 base single-stranded overhangs. 32 P-end labeled targets were allowed to 
hybridize to the probes. Free targets were removed by capturing the beads 
with a magnetic separator. DNA ligase was added and ligation was allowed 
to proceed at various salt concentrations. The samples were washed at room 
temperature, again manipulating the immobilized compounds with a 

20 magnetic separator to remove non-ligated material. Finally, samples were 
incubated at a temperature above the T m of the duplexes, and eluted single 
strand was retained after the remainder of the samples were removed by 
magnetic separation. The eluate at this point consisted of the ligated 
material. The fraction of ligation was estimated as the amount of s: P 

25 recovered in the high temperature wash versus the amount recovered in both 
the high and low temperature washes. Results indicated that salt conditions 
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can be found where the ligation proceeds efficiently with perfectly matched 
5 or 6 base overhangs, but not with G-T mismatches. The results of a more 
extensive set of similar experiments are shown in Tables 4-6, 

Table 4 looks at the effect of the position of the mismatch and 
5 Table 5 examines the effect of base composition on the relative 
discrimination of perfect matches verses weakly destabilizing mismatches. 
These data demonstrate that effective discrimination between perfect 
matches and single mismatches occurs with all five base overhangs tested 
and that there is little if any effect of base composition on the amount of 

1 0 ligation seen or the effectiveness of match/mismatch discrimination. Thus, 
the serious problems of dealing with base composition effects on stability 
seen in ordinary SBH do not appear to be a problem for positional SBH. 
Furthermore, as the worst mismatch position was the one distal from the 
phosphodiester bond formed in the ligation reaction, any mismatches that 

15 survived in this position would be eliminated by a polymerase extension 

reaction. A polymerase such as Sequenase version 2, that has no 3'- 

endonuclease activity or terminal transferase activity would be useful in this 

regard. Gel electrophoresis analysis confirmed that the putative ligation 

products seen in these tests were indeed the actual products synthesized. 

20 Table 4 

Ligation Efficiency of Matched and Mismatched Duplexes 
in 0.2 M NaCl a(37°C 

(SEQ ID NO 1 ) 3'-TCG AGA ACC TTG GCT-5' 

25 Ligation Efficient 

CTA CTA GGC TGC GTA GTC-5' (SEQ ID NO 2) 

5 - -B- GATGATCCG ACGCATCAG AGCTC 0.170 (SEQ ID NO 3) 

5'-B- GATGATCCG ACGCATCAG AGCTT 0.006 (SEQ ID NO 4) 

5'-B- GAT GAT CCG ACG CAT CAG AGC TA 0.006 (SEQ ID NO 7) 

30 S'-B- GATGATCCG ACGCATCAG AGC CC 0.002 (SEQ ID NO 8) 

5"-B- GATGATCCG ACG CAT CAG AGTTC 0 004 (SEQ ID NO 9) 
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5'-B- GAT GAT CCG ACG CAT C AG AAC TC 0.001 (SEQ ID NO 10) 



Table 5 

5 Ligation Efficiency of Matched and Mismatched Duplexes in 

0.2 M NaCl at 37°C and its Dependance on AT Content of the 
Overhang 

Overhang Sequences AT Content Ligation Efficiency 

10 

Match GGCCC 0/5 0.30 

Mismatch GGCCT 0.03 

Match AGCCC 1/5 0.36 

15 Mismatch AGCTC 0.02 

Match AGCTC 2/5 0.17 

Mismatch AGCTT 0.01 

20 Match AGATC 3/5 0.24 

Mismatch AGATT 0.01 

Match ATATC 4/5 0.17 

Mismatch ATATT 0.01 

25 

Match ATATT 5/5 0.31 

Mismatch ATATC 0.02 
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Table 6 

Increasing Discrimination by Sequencing Extension at 37°C 

Mgation Efficiency Ligation Extension fcpn^ 
5 (percent) (+) (-) 

(SEQIDNOl) 3'-TCG AGA ACC TTG GCT-5'* 

CTA CTA GGC TGC GTA GTC-5' (SEQ ID NO 2) 

5'-B- GAT GAT CCG ACG CAT CAG AGA TC 0.24 4,934 29,500 
10 (SEQ fD NO 11) 

5'-B- GAT GAT CCG ACG CAT CAG AGC TT 0_0_i JJj& 25_P_ 
(SEQ ID NO 4) 

Discrimination = x24 x42 x 1 1 8 

1 5 (SEQ ID NO 1 ) 3'-TCG AGA ACC TTG GCT-5'* 

CTA CTA GGC TGC GTA GTC-5' (SEQ ID NO 2) 

5'-B- GAT GAT CCG ACG CAT CAG ATA TC 0.17 12.250 25.200 

(SEQ ID NO 12) 

5'-B- GAT GAT CCG ACG CAT CAG ATA TT QJH 2i£ 3_9_0_ 
20 (SEQ ID NO 13) 

Discrimination = xl7 x51 x65 

"B" = Biotin "*" = radioactive label 

25 

The discrimination for the correct sequence is not as great with 
an external mismatch (which would be the most difficult case to 
discriminate) as with an internal mismatch (Table 6). A mismatch right at 
the ligation point would presumably offer the highest possible 
30 discrimination. In any event, the results shown are very promising. Already 
there is a level of discrimination with only 5 or 6 bases of overlap that is 
better than the discrimination seen in conventional SBH with 8 base 
overlaps. 

Example 10 Capture and Sequencin g of a Target Nucleic Acid. 
35 A mixture of target DNA was prepared by mixing equal molar 

ratio of eight different oligos. For each sequencing reaction, one specific 
partially duplex probe and eight different targets were used. The sequence 
of the probe and the targets are shown in Tables 7 and 8. 
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Table 7 
Duplex Probes Used 



10 



(DF25) 5'-F-GATGATCCGACGCATCAGCI<jI£I 
3'-CTACTAGGCTGCGTAGTC 

(DF37) 5'-F-GATGATCCGACGCATCACI£AA£ 
3'-CTACTAGGCTGCGTAGTG 

(DF22) 5'-F-GATGATCCGACGCATCAGAAIGj: 
3 ' -CTACTAGGCTGCGTAGTC 



(SEQ ID NO 14) 
(SEQ ID NO 2) 

(SEQ ID NO 15) 
(SEQ ID NO 2) 

(SEQ ID NO 16) 
(SEQ ID NO 2) 



15 



(DF28) 5'-F-GATGATCCGACGCATCAG££IAfi 
3*-CTACTAGGCTGCGTAGTC 

(DF36) 5 '-F-G ATG ATCCG ACGC ATCAGI£Gj^£ 
3 '-CTACTAGGCTGCGTAGTC 



(SEQ ID NO 17) 
(SEQ ID NO 2) 

(SEQ ID NO 18) 
(SEQ ID NO 2) 



(DF1 la) 5^F-GATGATCCGACGCATCACACx£I£ 
20 3 ' -CTA CTAGGCTGCG T AGTG 



(SEQ ID NO 19) 
(SEQ ID NO 2) 



(DF8a) 5' -F-GATGATCCG ACGCATC AAGJj£C_£ 
3-CTACTAGGCTGCGTAGTT 



(SEQ ID NO 20) 
(SEQ ID NO 2) 



25 



Table 8 
Mixture of Targets 



30 



35 



(NB4) 

CNB4.5) 
(DF5) 

(TS10) 
(NB3.10) 



3'-UA£ACCGGATCGAGCCGGGTCGATCTAG (DF22) 

(SEQ ID NO 2! ) 

3*-GJIAJC,GACCGGGTCGATCTAG (DF28) (SEQ ID NO 22) 

3"-A2£IG_CCGGATCGAGCCGGGTCGATCTAG (DF36) 

(SEQ ID NO 23) 

3'-ICjQAijAACCTTGGCT (DFIla) (SEQ ID NO 24) 

3'C££G_GTCGATCTAG (DF8a) (SEQ ID NO 25 > 



40 



Mismatch 

(NB3.4) 
(NB3.7) 
(NB3.9) 



3 '-Ci2jQG_ATC AAGCCGGGTCG ATCTAG (DF8a) (SEQ ID NO 26) 
3 '-I£AAG_CCGGGTCG ATCT AG (DFIla) (SEQ ID NO 2?) 

3 -A^i££G.GGTCGATCTAG (DF36) (SEQ ID NO 28) 



45 



Two pmol of each of the two duplex-probe- forming 
oligonucleotides and 1.5 pmol of each of the eight different targets were 
mixed in a 10 volume containing 2 \i\ of Sequenase buffer stock (200 mM 
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Tris-HCl, pH 7.5, 100 mM MgCl 2 , and 250 mM NaCl) from the Sequenase 
kit. The annealing mixture was heated to 65 °C and allowed to cool slowly 
to room temperature. While the reaction mixture was kept on ice, 1 ul 0. 1 
M dithiothreitol solution, 1 ul Mn buffer (0.15 M sodium isocitrate and 0. 1 
5 M MnCl 2 ), and 2 ul of diluted Sequenase (1 .5 units) were mixed, and the 2 
ul of reaction mixture was added to each of the four termination mixes at 
room temperature (each consisting of 3 ul of the appropriate termination 
mix: 16 uM dATP, 16 uM dCTP, 16 uM dGTP, 16 uM dTTP and 3.2 uM 
of one of the four ddNTPs, in 50 mM NaCl). The reaction mixtures were 

1 0 further incubated at room temperature for 5 minutes, and terminated with the 
addition of 4 ul of Pharmacia stop mix (deionized formamide containing 
dextran blue 6 mg/ml). Samples were denatured at 90-95 °C for 3 minutes 
and stored on ice prior to loading. Sequencing samples were analyzed on 
an ALF DNA sequencer (Pharmacia Biotech: Piscataway, NJ) using a 10% 

1 5 polyacrylamide gel containing 7 M urea and 0.6 x TBE. Sequencing results 
from the gel reader are shown in Figure 13 and summarized in Table 9. 
Matched targets hybridized correctly and are sequenced, whereas 
mismatched targets do not hybridize and are not sequenced. 
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Table 9 

Summary of Hybridization Data 



Reaction 


Hybridization 


Sequence 


Comment 


l 


Probe: DF25 Target: mixture 


No 


mismatch 


2 


Probe: DF37 Target: mixture 


No 


mismatch 


3 


Probe: DF22 Target: mixture 


Yes 


match 


4 


Probe: DF28 Target: mixture 


Yes 


match 


5 


Probe: DF36 Target: mixture 


Yes 


match 


6 


Probe: DF1 la Target: mixture 


Yes 


match 


7 


Probe: DF8a Target: mixture 


Yes 


match 


8 


Probe: DF8a Target: NB3.4 


No 


mismatch 


9 


Probe: DF8a Target: TS12 


No 


mismatch 


10 


Probe: DF37 Target: DF5 


No 


mismatch 



15 

Example 1 1 Elongation of Nucleic Acids Bound to Solid Supports. 

Elongation was carried out either by using Sequenase version 
2.0 kit or an AutoRead sequencing kit (Pharmacia Biotech; Piscataway, NJ) 
employing T7 DNA polymerase. Elongation of the immobilized single- 

20 stranded DNA target was performed with reagents from the sequencing kits 
for Sequenase Version 2.0 or T7 DNA polymerase. A duplex DNA probe 
containing a 5-base 3' overhang was used as a primer. The duplex has a 5'- 
fluorescein labeled 23-mer, containing an 1 8-base 5' constant region and a 
5-base 3' variable region (which has the same sequence as the 5'-end of the 

25 corresponding nonbiotinylated primer for PCR amplification of target DNA. 
and an 18-mer complementary to the constant region of the 23-mer. The 
duplex was formed by annealing 20 pmoi of each of the two 
oligonucleotides in a 10 ul volume containing 2 ul of Sequenase buffer 
stock (200 mM Tris-HCl, pH 7.5, 100 mM MgCk and 250 mM NaCl) from 

30 the Sequenase kit or in a 13 ul volume containing 2 ul of the annealing 
buffer (I M Tris-HCl. pH 7.6, 100 mM MgCl : ) from the AutoRead 
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sequencing kit. The annealing mixture was heated to 65 °C and allowed to 
cool slowly to 37 °C over a 20-30 minute time period. The duplex primer 
was annealed with the immobilized single-stranded DNA target by adding 
the annealing mixture to the DNA-containing magnetic beads and the 
5 resulting mixture was further incubated at 37°C for 5 minutes, room 
temperature for 10 minutes, and finally 0°C for at least 5 minutes. For 
Sequenase reactions, 1 ul 0.1 M dithiothreitol solution, 1 ul Mn buffer (0.15 
M sodium isocitrate and 0.1 M MnCl 2 ) for the relative short target, and 2 ul 
of diluted Sequenase (1.5 units) were added, and the reaction mixture was 

1 0 divided into four ice cold termination mixes (each consists of 3 ul of the 
appropriate termination mix: 80 uM dATP, 80 uM dCTP, 80 uM dGTP, 80 
uM dTTP and 8 uM of one of the four ddNTPs, in 50 mM NaCl). For T7 
DNA polymerase reactions, 1 ul of extension buffer (40 mM McCl 2 , pH 7.5, 
304 mM citric acid and 324 mM DTT) and 1 ul of T7 DNA polymerase (8 

15 units) were mixed, and the reaction volume was split into four ice cold 
termination mixes (each consisting of 1 ul DMSO and 3 ul of the 
appropriate termination mix: 1 mM dATP, I mM dCTP, 1 mM dGTP, 1 
mM dTTP and 5 uM of one of the four ddNTPs, in 50 mM NaCl and 40 mM 
Tris-HCl, pH 7.4). The reaction mixtures for both enzymes were further 

20 incubated at 0°C for 5 minutes, room temperature for 5 minutes and 37 °C 
for 5 minutes. After the completion of extension, the supernatant was 
removed, and the magnetic beads were re-suspended in 10 pi of Pharmacia 
stop mix. Samples were denatured at 90-95°C for 5 minutes (under this 
harsh condition, both DNA template and the dideoxy fragments are released 

25 from the beads) and stored on ice prior to loading. A control experiment 
was performed in parallel using a 18-mer complementary to the 3' end of 



WO 96/32504 



PCT/US96/05136 



target DNA as the sequencing primer instead of the duplex probe and the 
annealing of 18-mer to its target was carried out in a similar way as the 
annealing of the duplex probe. 

Example 12 Chain Elongation of Target Sequences. 
5 Sequencing of immobilized target DNA can be performed 

with Sequenase Version 2.0. A total of 5 elongation reactions, one with 
each of 4 dideoxy nucleotides and one with all four simultaneously, are 
performed. A sequencing solution, containing (40 mM Tris-HCl, pH 7.5, 
20 mM MgCl 2 , and 50 mM NaCl, 10 mM dithiothreitoi solution, 15 mM 

1 0 sodium isocitrate and 1 0 mM MnCl 2 , and 1 00 u/ml of Sequenase (1.5 units) 
is added to the hybridized target DNA. dATP, dCTP, dGTP and dTTP are 
added to 20 uM to initiate the elongation reaction. In the separate reactions, 
one of four ddNTP is added to reach a concentration of 8 uM. In the 
combined reaction all four ddNTP are added to the reaction to 8 uM each. 

15 The reaction mixtures were incubated at 0°C for 5 minutes room 
temperature for 5 minutes and 37°C for 5 minutes. After the completion of 
extension, the supernatant was removed and the elongated DNA washed 
with 2 mM EDTA to terminate elongation reactions. Reaction products are 
analyzed by mass spectrometry. 



20 
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Example 13 Capillary Electrophoretic Analysis of Target Nucleic Acid. 

Molecular weights of target sequences may also be determined 
by capillary electrophoresis. A single laser capillary electrophoresis 
instrument can be used to monitor the performance of sample preparations 
5 in high performance capillary electrophoresis sequencing. This instrument 
is designed so that it is easily converted to multiple channel (wavelengths) 
detection. 

An individual element of the sample array may be engineered 
directly to serve as the sample input to a capillary. Typical capillaries are 

10 250 microns o.d. and 75 microns i.d. The sample is heated or denatured to 
release the DNA ladder into a liquid droplet, the silicon array surfaces is 
ideal for this purpose. The capillary can be brought into contact with the 
droplet to load the sample. 

To facilitate loading of large numbers of samples 

15 simultaneously or sequentially, there are two basic methods. With 250 
micron o.d. capillaries it is feasible to match the dimensions of the target 
array and the capillary array. Then the two could be brought into contact 
manually or even by a robot arm using a jig to assure accurate alignment. 
An electrode may be engineered directly into each sector of the silicon 

20 surface so that sample loading would only require contact between the 
surface and the capillar)' array. 

The second method is based on an inexpensive collection 
system to capture fractions eluted from high performance capillary 
electrophoresis. Dilution is avoided by using designs which allow sample 

25 collection without a perpendicular sheath flow. The same apparatus 
designed as a sample collector can also serv e inversely as a sample loader. 
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In this case, each row of the sample array, equipped with electrodes, is used 
directly to load samples automatically on a row of capillaries. Using either 
method, sequence information is determined and the target sequence 
constructed. 

5 Example 14 Mass Spectrometry of Nucleic Acids. 

Nucleic acids to be analyzed by mass spectrometry were 
redissolved in ultrapure water (MilliQ, Millipore) using amounts to obtain 
a concentration of 10 pmoles/ul as stock solution. An aliquot (1 ul) of this 
concentration or a dilution in ultrapure water was mixed with 1 ul of the 

1 0 matrix solution on a flat metal surface serving as the probe tip and dried 
with a fan using cold air. In some experiments, cation-ion exchange beads 
in the acid form were added to the mixture of matrix and sample solution to 
stabilize ions formed during analysis. 

MALDI-TOF spectra were obtained on different commercial 

15 instruments such as Vision 2000 (Finnigan-MAT), VG TofSpec (Fisons 
Instruments), LaserTec Research (Vestec). The conditions were linear 
negative ion mode with an acceleration voltage of 25 kV. Mass calibration 
was done externally and generally achieved by using defined peptides of 
appropriate mass range such as insulin, gramicidin S. trypsinogen, bovine 

20 serum albumen and cytochrome C. All spectra were generated by 
employing a nitrogen laser with 5 nanosecond pulses at a wavelength of 337 
nm. Laser energy varied between 10 6 and 10 7 W/cnr. To improve signal- 
to-noise ratio generally, the intensities of 10 to 30 laser shots were 
accumulated. The output of a typical mass spectrometry showing 

25 discrimination between nucleic acids which differ by one base is shown in 
Figure 14. 
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Example 15 Sequence Determination from Mass Spectrometry . 

Elongation of a target nucleic acid, in the presence of dideoxy 
chain terminating nucleotides, generated four families of chain-terminated 
fragments. The mass difference per nucleotide addition is 289.1 9 for dpC, 
5 313.21 for dpA, 329.21 for dpG and 304.20 for dpT, respectively. 
Comparison of the mass differences measured between fragments with the 
known masses of each nucleotide the nucleic acid sequence can be 
determined. Nucleic acid may also be sequenced by performing polymerase 
chain elongation in four separate reactions each with one dideoxy chain 

10 terminating nucleotide. To examine mass differences, 13 oligonucleotides 
from 7 to 50 bases in length were analyzed by MALDI-TOF mass 
spectrometry. The correlation of calculated molecular weights of the ddT 
fragments of a Sanger sequencing reaction and their experimentally verified 
weights are shown in Table 10. When the mass spectrometry data from all 

1 5 four chain termination reactions are combined, the molecular weight 
difference between two adjacent peaks can be use to determine the 
sequence. 
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Table 10 

Summary of Molecular Weights Expected v. Measured 



Fragment (n-mer) Calculated Mass Experimental Mass Difference 



7-mer 


2104.45 


2119.9 


+15.4 


10-mer 


3011.04 


3026.1 


+15.1 


1 1-mer 


3315.24 


3330.1 


+14.9 


1 9-mer 


5771.82 


5788.0 


+ 16.2 


20-mer 


6076.02 


6093.8 


+ 17.8 


24-mer 


7311.82 


7374.9 


+63.1 


26-mer 


7945.22 


7960.9 


+ 15.7 


33-mer 


10112.63 


10125.3 


+ 12.7 


3 7-mer 


11348.43 


11361.4 


+13.0 


38-mer 


11652.62 


11670.2 


+ 17.6 


42-mer 


12872.42 


12888.3 


+ 15.9 


46-mer 


14108.22 


14125.0 


+ 16.8 


50-mer 


15344.02 


15362.6 


+ 18.6 



Example 16 Reduced Pass Sequencing . 

20 To maximize the use of PSBH arrays to produce Sanger 

ladders, the sequence of a target should be covered as completely as possible 
with the lowest amount of initial sequencing redundancy. This will 
maximize the performance of individual elements of the arrays and 
maximize the amount of usefui sequence data obtained each time an array 

25 is used. With an unknown DNA, a full array of 1024 elements (Mwo 1 or 
BsiY I cleavage) or 256 elements (TspR I cleavage) is used. A 50 kb target 
DNA is cut into about 64 fragments by Mwo I or BsiY I ox 30 fragments by 
TspR /, respectively. Each fragment has two ends both of which can be 
captured independently. The coverage of each array after capture and 

30 ignoring degeneracies is 128/1024 sites in the first case and 60/256 sites in 
the second case. Direct use of such an array to blindly deliver samples 
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element by element for mass spectrometry sequencing would be inefficient 
since most array elements will have no samples. 

In one method, phosphatased double-stranded targets are used 
at high concentrations to saturate each array element that detects a sample. 
5 The target is ligated to make the capture irreversible. Next a different 
sample mixture is exposed to the array and subsequently ligated in place. 
This process is repeated four or five times until most of the elements of the 
array contain a unique sample. Any tandem target-target complexes will be 
removed by a subsequent ligating step because all of the targets are 
1 0 phosphatased. 

Alternatively, the array may be monitored by confocal 
microscopy after the elongation reactions. This reveals which elements 
contain elongated nucleic acids and this information is communicated to an 
automated robotic system that is ultimately used to load the samples onto a 

1 5 mass spectrometry analyzer. 

Example 17 Synthesis of Mass Modified Nucleic Acid Primers . 

Mass modification at the 5' sugar: Oligonucleotides were 
synthesized by standard automated DNA synthesis using 6- 
cyanoethylphosphoamidites and a 5'-amino group introduced at the end of 

20 solid phase DNA synthesis. The total amount of an oligonucleotide 
synthesis, starting with 0.25 micromoles CPG-bound nucleoside, is 
deprotected with concentrated aqueous ammonia, purified via OligoPAK™ 
Cartridges (Millipore: Bedford. MA) and lyophilized. This material with a 
5'-terminal amino group is dissolved in 100 ul absolute N, N- 

25 dimethylformamide (DMF) and condensed with 10 ^mole N-Fmoc-glvcine 
pentafluorophenyl ester for 60 minutes at 25 °C. After ethanol precipitation 
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and centrifugation, the Fmoc group is cleaved off by a 10 minute treatment 
with 100 ul of a solution of 20% piperidine in N,N-dimethylformamide. 
Excess piperidine, DMF and the cleavage product from the Fmoc group are 
removed by ethanol precipitation and the precipitate lyophilized from 10 
5 mM TEAA buffer pH 7.2. This material is now either used as primer for the 
Sanger DNA sequencing reactions or one or more glycine residues (or other 
suitable protected amino acid active esters) are added to create a series of 
mass-modified primer oligonucleotides suitable for Sanger DNA or RNA 
sequencing. 

10 Mass modification at the heterocyclic base with glycine: 

Starting material was 5-(3-aminopropynyl-l)-3'5'-di-p-toIyldeoxyuridine 
prepared and 3' S'-de-O-acylated (Haralambidis et al., Nuc. Acids Res. 
15:4857-76, 1987). 0.281 g (1.0 mmol) 5-(3-aminopropynyl-I)-2'- 
deoxyuridine were reacted with 0.927 g (2.0 mmol) N-Fmoc-glycine 

15 pentafluorophenylester in 5 ml absolute N,N-dimethylformamide in the 
presence of 0.129g (1 mmol; 174 ul) N,N-diisopropylethylamine for 60 
minutes at room temperature. Solvents were removed by rotary evaporation 
and the product was purified by silica gel chromatography (Kieselgel 60. 
Merck; column: 2.5 x 50 cm, elution with chloroform/methanol mixtures). 

20 Yield was 0.44 g (0.78 mmol; 78%). To add another glycine residue, the 
Fmoc group is removed with a 20 minutes treatment with 20% solution of 
piperidine in DMF, evaporated in vacuo and the remaining solid materia! 
extracted three times with 20 ml ethylacetate. After having removed the 
remaining ethylacetate, N-Fmoc-glycine pentafluorophenylester is coupled 

25 as described above. 5-(3(N-Fmoc-glycyl)-amidopropynyl-l)-2'-deox\'uridine 
is transformed into the 5'-0-dimethox\iritylated nucleoside-3-O-B- 



WO 96/32504 



PCT/US96/05136 



80 

cyanoethyl-N,N-diisopropylphosphoamidite and incorporated into 
automated oligonucleotide synthesis. This glycine modified thymidine 
analogue building block for chemical DNA synthesis can be used to 
substitute one or more of the thymidine/uridine nucleotides in the nucleic 
5 acid primer sequence. The Fmoc group is removed at the end of the solid 
phase synthesis with a 20 minute treatment with a 20% solution of 
piperidine in DMF at room temperature. DMF is removed by a washing 
step with acetonitrile and the oligonucleotide deprotected and purified. 

Mass modification at the heterocyclic base with P-alanine: 

10 0.281 g (1.0 -mmol) 5-(3-Aminopropynyl-l)-2'-deoxyuridine was reacted 
with N-Fmoc-6-alanine pentafluorophenylester (0.955 g; 2.0 mmol) in 5 ml 
N,N-dimethylformamide (DMF) in the presence of 0.129 g (174 ul; 1.0 
mmol) N,N-disopropylethylamine for 60 minutes at room temperature. 
Solvents were removed and the product purified by silica gel 

15 chromatography. Yield was 0.425 g (0.74 mmol: 74%). Another B-alanine 
moiety can be added in exactly the same way after removal of the Fmoc 
group. The preparation of the 5'-0-dimethoxytritylated nucleoside-3'-0-B- 
cyanoethyl-N,N-diisopropylphosphoamidite from 5-(3-(N-Fmoc-B-alany 1 )- 
amidopropynyl-l)-2'-deoxyuridine and incorporation into automated 

20 oligonucleotide synthesis is performed under standard conditions. This 
building block can substitute for any of the thymidine/uridine residues in the 
nucleic acid primer sequence. 

Mass modification at the heterocyclic base with ethylene 
monomethyl ether: 5-(3-aminopropynyl-l)-2'-deoxyuridine was used as a 

25 nucleosidic component in this example. 7.61 g (100.0 mmol) freshly 
distilled ethylene glycol monomethyl ether dissolved in 50 ml absolute 
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pyridine was reacted with 10.01 g (100.0 mmol) recrystallized succinic 
anhydride in the presence of 1.22 g (10.0 mmol) 4-N,N~ 
dimethylaminopyridine overnight at room temperature. The reaction was 
terminated by the addition of water (5.0 ml), the reaction mixture evaporated 
5 in vacuo, co-evaporated twice with dry toluene (20 ml each) and the residue 
redissolved in 100 ml dichloromethane. The solution was twice extracted 
successively with 10% aqueous citric acid (2 x 20 ml) and once with water 
(20 ml) and the organic phase dried over anhydrous sodium sulfate. The 
organic phase was evaporated in vacuo. Residue was redissolved in 50 ml 

1 0 dichloromethane and precipitated into 500 ml pentane and the precipitate 
dried in vacuo. Yield was 13.12 g (74.0 mmol; 74%). 8.86 g (50.0 mmol) 
of succinylated ethylene glycol monomethyl ether was dissolved in 100 ml 
dioxane containing 5% dry pyridine (5 ml) and 6.96 g (50.0 mmol) 4- 
nitrophenol and 10.32 g (50.0 mmol) dicyclohexylcarbodiimide was added 

1 5 and the reaction run at room temperature for 4 hours. Dicyclohexylurea was 
removed by filtration, the filtrate evaporated in vacuo and the residue 
redissolved in 50 ml anhydrous DMF. 12.5 ml (about 12.5 mmol 4- 
nitrophenylester) of this solution was used to dissolve 2.81 g (10.0 mmol) 
5-(3-aminopropynyl-I)-2'-deoxyuridine. The reaction was performed in the 

20 presence of 1.01 g (10.0 mmol; 1.4 ml) triethylamine overnight at room 
temperature. The reaction mixture was evaporated in vacuo, co-evaporated 
with toluene, redissolved in dichloromethane and chromatographed on 
silicagel (Si60. Merck; column 4 x 50 cm) with dichloromethane/methanol 
mixtures. Fractions containing the desired compound were collected. 

25 evaporated, redissolved in 25 ml dichloromethane and precipitated into 250 
ml pentane. The dried precipitate of 5-(3-N-(0-succinyl ethylene glycol 
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monomethyl ether)-amidopropynyI-I)-2'-deoxyuridine (yield 65%) is S'-O- 
dimethoxytritylated and transformed into the nucleoside-3'-0-B-cyanoethyl- 
N, N-diisopropylphosphoamidite and incorporated as a building block in the 
automated oligonucleotide synthesis according to standard procedures. The 
5 mass-modified nucleotide can substitute for one or more of the 
thymidine/uridine residues in the nucleic acid primer sequence. 
Deprotection and purification of the primer oligonucleotide also follows 
standard procedures. 

Mass modification at the heterocyclic base with diethylene 

1 0 glycol monomethyl ether: Nucleosidic starting material was as in previous 
examples, 5-(3-aminopropynyl-l)-2'-deoxyuridine. 12.02 g (100.0 mmol) 
freshly distilled diethylene glycol monomethyl ether dissolved in 50 ml 
absolute pyridine was reacted with 10.01 g (100.0 mmol) recrystallized 
succinic anhydride in the presence of 1.22 g (10,0 mmol) 4-N, N- 

1 5 dimethylarninopyridine (DMAP) overnight at room temperature. Yield was 
18.35 g (82.3 mmol; 82.3%). 11.06 g (50.0 mmol) of succinylated 
diethylene glycol monomethyl ether was transformed into the 4- 
nitrophenyl ester and. subsequently. 1 2.5 mmol was reacted with 2.8 1 g (10.0 
mmol) of 5-(3-aminopropynyI -l)-2'-deoxyuridine. Yield after silica gel 

20 column chromatography and precipitation into pentane was 3.34 g (6.9 
mmol; 69%). After dimethoxytritylation and transformation into the 
nucleoside-6-cyanoethylphosphoamidite, the mass-modified building block 
is incorporated into automated chemical DNA synthesis. Within the 
sequence of the nucleic acid primer, one or more of the thymidine/uridine 

25 residues can be substituted by this mass-modified nucleotide. 
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Mass Modification at the heterocyclic base with glycine: 

Starting material was N 6 -benzoyl-8-bromo-5'-0-(4,4 , -dimethoxytrityl)-2'- 
deoxyadenosine (Singh et al., Nuc. Acids Res. 1 8:3339-45, 1990). 632.5 mg 
(1.0 mmol) of this 8-bromo-deoxyadenosine derivative was suspended in 5 
5 ml absolute ethanol and reacted with 25 1 .2 mg (2.0 mmol) glycine methyl 
ester (hydrochloride) in the presence of 241 .4 mg (2.1 mmol; 366 ul) N,N- 
diisopropylethylamine and refluxed until the starting nucleosidic material 
had disappeared (4-6 hours) as checked by thin layer chromatography 
(TLC). The solvent was evaporated and the residue purified by silica gel 

10 chromatography (column 2.5 x 50 cm) using solvent mixtures of 
chloroform/methanol containing 0.1% pyridine. Product fractions were 
combined, the solvent evaporated, the fractions dissolved in 5 ml 
dichloromethane and precipitated into 100 ml pentane. Yield was 487 mg 
(0.76 mmol; 76%). Transformation into the corresponding nucleoside-B- 

1 5 cyanoethylphospho amidite and integration into automated chemical DNA 
synthesis is performed under standard conditions. During final deprotection 
with aqueous concentrated ammonia, the methyl group is removed from the 
glycine moiety. The mass-modified building block can substitute one or 
more deoxy adenosine/adenosine residues in the nucleic acid primer 

20 sequence. 

Mass modification at the heterocyclic base with 
glycylglycine: 632.5 mg (1.0 mmol) N 6 -Benzoyl-8-bromo-5'-0- 
(4,4'dimeethoxytrityl)2'-deoxyadenosine was suspended in 5 ml absolute 
ethanol and reacted with 324.3 mg (2.0 mmol) glycyl-glycine methyl ester 
25 in the presence of 24 1 .4 mg (2. 1 mmol; 366 //l) N. N-diisopropylethylamine. 
The mixture was refluxed and completeness of the reaction checked by 
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TLC. Yield after silica gel column chromatography and precipitation into 
pentane was 464 mg (0.65 mmol; 65%). Transformation into the 
nucleoside-B-cyanoethylphosphoamidite and into synthetic oligonucleotides 
is done according to standard procedures. 
5 Mass Modification at the heterocyclic base with glycol 

monomethyl ether: Starting material was 5'-0-(4,4-dirnethoxytrityl)-2'- 
amino-2'-deoxythymidine synthesized (Verheyden et al., J. Org. Chem. 
36:250-54, 1971; Sasaki et al, J. Org. Chem. 41:3138-43, 1976; Imazawa et 
al., J. Org. Chem. 44:2039-41, 1979; Hobbs et al., J. Org. Chem. 42:714-19. 

10 1976; Ikehara et al., Chem. Pharm. Bull. Japan 26:240-44, 1978). 5*-0-(4,4- 
Dimethoxytrityl)-2'-amino-2'-deoxythymidine (559.62 mg; 1.0 mmol) was 
reacted with 2.0 mmol of the 4-nitrophenyl ester of succinylated ethylene 
glycol monomethyl ether in 10 ml dry DMF in the presence of 1.0 mmol 
(140 triethylamine for 18 hours at room temperature. The reaction 

1 5 mixture was evaporated in vacuo, co-evaporated with toluene, redissolved 
in dichloromethane and purified by silica gel chromatography (Si60. Merck; 
column: 2.5 x 50 cm; eluent: chloroform/methanol mixtures containing 0. 1 % 
triethylamine). The product containing fractions were combined, evaporated 
and precipitated into pentane. Yield was 524 mg (0.73 mmol; 73%). 

20 Transformation into the nucleoside-B-cyanoethyl-N.N — 
diisopropylphosphoamidite and incorporation into the automated chemical 
DNA synthesis protocol is performed by standard procedures. The mass- 
modified deoxythymidine derivative can substitute for one or more of the 
thymidine residues in the nucleic acid primer. 

25 In an analogous way, by employing the 4-nitrophenyl ester of 

succinylated diethylene glycol monomethyl ether and triethylene glycol 
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monomethyl ether, the corresponding mass-modified oligonucleotides are 
prepared. In the case of only one incorporated mass-modified nucleoside 
within the sequence, the mass difference between the ethylene, diethylene 
and triethylene glycol derivatives is 44.05, 88.1 and 132.15 daltons, 
5 respectively. 

Mass modification at the heterocyclic base by alleviation 

Phosphorothioate-containing oligonucleotides were prepared (Gait et al., 
Nuc. Acids Res. 19:1 183, 1991). One, several or all internucleotide linkages 
can be modified in this way. The (-)Ml 3 nucleic acid primer sequence (1 7- 

10 mer) 5'-dGTAAAACGACGGCCAGT (SEQ ID NO 29) is synthesized in 
0.25 Mmole scale on a DNA synthesizer and one phosphorothioate group 
introduced after the final synthesis cycle (G to T coupling). Sulfurization, 
deprotection and purification followed standard protocols. Yield was 3 1 .4 
nmole (12.6% overall yield), corresponding to 3 1 .4 nmole phosphorothioate 

1 5 groups. Alkylation was performed by dissolving the residue in 3 1 .4 fj\ IE 
buffer (0.01 M Tris pH 8.0, 0.001 M EDTA) and by adding 16 ^1 of a 
solution of 20 mM solution of 2-iodoethanol (320 nmole; 10-fold excess 
with respect to phosphorothioate diesters) in N.N-dimethylfonnarnide 
(DMF). The alkylated oligonucleotide was purified by standard reversed 

20 phase HPLC (RP-18 Ultraphere, Beckman; column: 4.5 x 250 mm; 100 mM 
triethyl ammonium acetate, pH 7.0 and a gradient of 5 to 40% acetonitrile). 

In a variation of this procedure, the nucleic acid primer 
containing one or more phosphorothioate phosphodiester bond is used in the 
Sanger sequencing reactions. The primer-extension products of the four 

25 sequencing reactions are purified, cleaved off the solid support, lyophilized 
and dissolved in 4 iA each of TE buffer pH 8.0 and alkylated by addition of 
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2 fxl of a 20 mM solution of 2-iodoethanol in DMF. It is then analyzed by 
ES and/or MALDI mass spectrometry. 

In an analogous way, employing instead of 2-iodoethanol, e.g., 
3iodopropanol, 4-iodobutanol mass-modified nucleic acid primer are 
5 obtained with a mass difference of 14.03, 28.06 and 42.03 daltons 
respectively compared to the unmodified phosphorothioate phosphodiester- 
containing oligonucleotide. 

Example 1 8 Mass Modification of Nucleotide Triphosphates . 
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Mass modification of nucleotide triphosphates at the 2' and 
3' amino function: Starting material was 2'-azido-2'-deoxyuridine prepared 
according to literature (Verheyden et al., J. Org. Chem. 36:250, 1971), 
which was 4,4- dimethoxytritylated at 5'-OH with 4,4-dimethoxytrityl 
5 chloride in pyridine and acetylated at 3'-OH with acetic anhydride in a one^ 
pot reaction using standard reaction conditions. With 191 mg (0.71 mmol) 
2'-azido-2'-deoxyuridine as starting material, 396 mg (0.65 mmol; 90.8%) 
5'-0-(4,4-dimethoxytrityl)-3'-0-acetyl-2'-azido-2'-deoxyuridine was 
obtained after purification via silica gel chromatography. Reduction of the 

10 azido group was performed (Barta et al., Tetrahedron 46:587-94, 1990). 
Yield of 5'-0-(4,4-dimethoxytrityl)-3'-0-acetyl-2'-amino-2'-deoxyuridine 
after silica gel chromatography was 288 mg (0.49 mmol; 76%). This 
protected 2'-amino-2'-deoxyuridine derivative (588 mg, 1.0 mmol) was 
reacted with 2 equivalents (927 mg; 2.0 mmol) N-Fmoc-glycine 

15 pentafluorophenyl ester in 10 ml dry DMT overnight at room temperature 
in the presence of 1 .0 mmol ( 1 74 /u\) N,N-diisopropylethylamine. Solvents 
were removed by evaporation in vacuo and the residue purified by silica gel 
chromatography. Yield was 71 1 mg (0.71 mmol; 82%). Detritylatton was 
achieved by a one hour treatment with 80% aqueous acetic acid at room 

20 temperature. The residue was evaporated to dryness, co-evaporated twice 
with toluene, suspended in 1 ml dry acetonitrile and 5'-phosphorylated with 
POCl 3 and directly transformed in a one-pot reaction to the 5'-triphosphate 
using 3 ml of a 0.5 M solution (1.5 mmol) tetra (tri-n-butylammonium) 
pyrophosphate in DMT according to literature. The Fmoc and the 3-0- 

25 acetyl groups were removed by a one-hour treatment with concentrated 
aqueous ammonia at room temperature and the reaction mixture evaporated 
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and lyophilized. Purification also followed standard procedures by using 
anion-exchange chromatography on DEAE Sephadex with a linear gradient 
of triethylammonium bicarbonate (0. 1 M- 1.0 M). Triphosphate containing 
fractions, checked by thin layer chromatography on polyethyleneimine 
5 cellulose plates, were collected, evaporated and lyophilized. Yield by UV- 
absorbance of the uracil moiety was 68% or 0.48 mmol. 

A glycyl-glycine modified 2'-amino-2'-deoxyuridine-5'- 
triphosphate was obtained by removing the Fmoc group from 5'-0-(4,4- 
dimemoxytRtyI)-3'-0-acetyl-2'-N(N-9-fluorenylmethyloxycarbonyl-glycyl)- 

10 2'-amino-2'-deoxyuridine by a one-hour treatment with a 20% solution of 
piperidine in DMF at room temperature, evaporation of solvents, two- fold 
co-evaporation with toluene and subsequent condensation with N-Fmoc- 
glycine pentafluorophenyl ester. Starting with 1 .0 mmol of the 2'-N-glycyl- 
2'-amino-2'-deoxyuridine derivative and following the procedure described 

15 above, 0.72 mmol (72%) of the corresponding 2'~(N-glycyl-glycyl)-2'- 
amino-2'-deoxyuridine-5'triphosphate was obtained. 

Starting with 5'-O-(4,4-dimethoxytrityl)-3 , -0-acetyl-2'-amino- 
2'deoxyuridine and coupling with N-Fmoc-6-alanine pentafluorophenyl 
ester, the corresponding 2'-(N-6-alanyl)-2'-amino-2'-deoxyuridine-5'- 

20 triphosphate are synthesized. These modified nucleoside triphosphates are 
incorporated during the Sanger DNA sequencing process in the primer- 
extension products. The mass difference between the glycine. 6-alanine and 
glycyl-glycine mass-modified nucleosides is. per nucleotide incorporated. 
58.06. 72.09 and 1 15.1 daltons, respectively. 

25 Wnenstartingwith5'-0-(4.4-dimethox\'trir\'l)-3'-amino-2'.3' 1- 

dideoxythymidine, the corresponding 3'-(N-glycyl)-3'-amino-. 3'-(-N-gIycy|- 
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glycyl>3'-amino-, and 3'-(N-B-alanyl)-3'-amino-2 , ,3 , -dideoxythymidine-5*- 
triphosphates can be obtained. These mass-modified nucleoside 
triphosphates serve as a terminating nucleotide unit in the Sanger DNA 
sequencing reactions providing a mass difference per terminated fragment 
5 of 58.06, 72.09 and 115.1 daltons respectively when used in the 
multiplexing sequencing mode. The mass-differentiated fragments are 
analyzed by ES and/or MALDI mass spectrometry. 

Mass modification of nucleotide triphosphates at C-5 of the 
heterocyclic base: 0.281 g (1.0 mmol) 5-(3-Aminopropynyl-l)-2'- 

10 deoxyuridine was reacted with either 0.927 g (2.0 mmol) N-Fmoc-glycine 
pentafluorophenylester or 0.955g (2.0 mmol) N-Fmoc-B-alanine 
pentafluorophenyl ester in 5 ml dry DMF in the presence of 0.129 g N, N- 
diisopropylethylamine ( 1 74 ul, 1.0 mmol) overnight at room temperature. 
Solvents were removed by evaporation in vacuo and the condensation 

1 5 products purified by flash chromatography on silica gel (Still et a!., J. Org., 
Chem. 43: 2923-25, 1978). Yields were 476 mg (0.85 mmol: 850%) for the 
glycine and 436 mg (0.76 mmol; 76%) for the 6-alanine derivatives. For the 
synthesis of the glycyl-glycine derivative, the Fmoc group of 1 .0 mmol 
Fmoc-glycine-deoxyuridine derivative was removed by one-hour treatment 

20 with 20% piperidine in DMF at room temperature. Solvents were removed 
by evaporation in vacuo, the residue was coevaporated twice with toluene 
and condensed with 0.927 g (2.0 mmol) N-Fmoc-glycine pentafluorophenyl 
ester and purified as described above. Yield was 445 mg (0.72 mmol; 72%). 
The glycyl-, glycyl-glycyl- and B-alanyl-2'-deoxyuridine derivatives. N- 

25 protected with the Fmoc group were transformed to the 3'-0-acetyl 
derivatives by tritylation with 4.4-dimethoxytrityl chloride in pyridine and 
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acetylation with acetic anhydride in pyridine in a one-pot reaction and 
subsequently detritylated by one hour treatment with 80% aqueous acetic 
acid according to standard procedures. Solvents were removed, the residues 
dissolved in 100 ml chloroform and extracted twice with 50 ml 10% sodium 
5 bicarbonate and once with 50 ml water, dried with sodium sulfate, the 
solvent evaporated and the residues purified by flash chromatography on 
silica gel. Yields were 361 mg (0.60 mmol; 71%) for the glycyl-, 351 mg 
(0.57 mmol; 75%) for the B-alanyl- and 323 mg (0.49 mmol; 68%) for the 
glycyl-glycyl-3-0'-acetyl-2'-deoxyuridine derivatives, respectively. 

10 Phosphorylation at the 5'-OH with POCl 3 , transformation into the 5'- 
triphosphate by in situ reaction with tetra(tri-n-butylammonium) 
pyrophosphate in DMF, 3-de-0-acetylation, cleavage of the Fmoc group, 
and final purification by anion-exchange chromatography on DEAJE- 
Sephadex was performed and yields according to UV-absorbance of the 

15 uracil moiety were 0.41 mmol 5-(3-(N-glycyl)-arnidopropynyl-l)-2'- 
deoxyuridine-5'-triphosphate (84%), 0.43 mmol 5-(3-(N-B-alanyl)- 
amidopropynyl-l)-2'-deoxyuridine-5'-triphosphate (75%) and 0.38 mmol 5- 
(3-(N-glycyl-glycyl)-amidopropynyl-l)-2'-deoxyuridine-5'-triphosphate 
(78%). These mass-modified nucleoside triphosphates were incorporated 

20 during the Sanger DNA sequencing primer-extension reactions. 

When using 5-(3-aminopropynyl)-2',3'-dideoxyuridine as 
starting material and following an analogous reaction sequence the 
corresponding glycyl-, glycyl-glycyl-and 8-alanyl-2'.3'-dideoxyuridine-5'- 
triphosphates were obtained in yields of 69%, 63% and 71%. respectively. 

25 These mass-modified nucleoside triphosphates serve as chain-terminating 
nucleotides during the Sanger DNA sequencing reactions. The mass- 



WO 96/32504 



PCT/US96/05136 



91 

modified sequencing ladders are analyzed by either ES or MALDI mass 
spectrometry. 

Mass modification of nucleotide triphosphates: 727 mg 

(1.0 mmol) of N 6 -(4-tert-butylphenoxyacetyl)-8-glycyl-5'-(4,4- 
5 dimethoxytrityl)-2'- deoxyadenosine or 800 mg (1.0 mmol) N 6 -(4-tert- 
butylphenoxyacetyI)-8-glycyl-gIycyl-5'-(4,4-dimethoxytrityl)-2'- 
deoxyadenosine prepared according to literature (Koster et al., Tetrahedron 
37:362, 1981) were acetylated with acetic anhydride in pyridine at the 3'- 
OH, detritylated at the 5-position with 80% acetic acid in a one-pot reaction 

10 and transformed into the 5 '-triphosphates via phosphorylation with POC1-, 
and reaction in situ with tetra(tri-n-butylammonium) pyrophosphate. 
Deprotection of the N 6 tert-butylphenoxyacetyl, the 3'-0-acetyl and the O- 
methyl group at the glycine residues was achieved with concentrated 
aqueous ammonia for ninety minutes at room temperature. Ammonia was 

15 removed by lyophilization and the residue washed with dichloromethane, 
solvent removed by evaporation in vacuo and the remaining solid material 
purified by anion-exchange chromatography on DEAE-Sephadex using a 
linear gradient of triethylammonium bicarbonate from 0.1 to 1.0 M. The 
nucleoside triphosphate containing fractions (checked by TLC on 

20 polyethyleneimine cellulose plates) were combined and lyophilized. Yield 
of the 8-glycyl-2 , -deoxyadenosine-5'-triphosphate (determined by UV- 
absorbance of the adenine moiety) was 57% (0.57 mmol). The yield for the 
8-glycyl-glycyl-2'-deoxyadenosine-5'-triphosphate was 51% (0.51 mmol). 
These mass-modified nucleoside triphosphates were incorporated during 

25 primer-extension in the Sanger DNA sequencing reactions. 
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When using the corresponding N6-(4-tert- 
butylphenoxyacetyl)-8-glycyl- or-glycyl-glycyl-5'-0-(4,4-dimethoxytrityl> 
2',3-dideoxyadenosine derivatives as starting materials (for the introduction 
of the 2*,3 -function: Seela et al., Helvetica Chimica Acta 74: 1 048-58, 1 99 1 ). 
5 Using an analogous reaction sequence, the chain-terminating mass-modified 
nucleoside triphosphates 8-glycyl- and 8-glycyl-glycyl-2'.3'- 
dideoxyadenosine-5'-triphosphates were obtained in 53 and 47% yields, 
respectively. The mass-modified sequencing fragment ladders are analyzed 
by either ES or MALDI mass spectrometry. 

10 Example 19 Mass Modification of Nucleotides by Alkylation After Sanger 
Sequencing- 

2\3 , -Dideoxythymidine-5'-(alpha-S)-triphosphate was 
prepared according to published procedures (for the alpha-S-triphosphate 
moiety: Eckstein et al., Biochemistry 15:1685, 1976) and Accounts Chem. 

15 Res. 12:204, 1978) and for the 2',3'-dideoxy moiety: Seela et al., Helvetica 
Chimica Acta 74:1048-58, 1991). Sanger DNA sequencing reactions 
employing 2'-deoxythymidine-5'-(alpha-S)-triphosphate are performed 
according to standard protocols. When using 2',3'-dideoxythymidine-5'- 
(alpha-S)-triphosphates, this is used instead of the unmodified 2'. 3'- 

20 dideoxythymidine-5'-triphosphate in standard Sanger DNA sequencing. The 
template (2 picomole) and the nucleic acid Ml 3 sequencing primer (4 
picomole) are annealed by heating to 65 °C in 100 ul of 10 mM Tris-HCl. 
pH 7.5, 10 mM MgCl : , 50 mM NaCl. 7 mM dithiothreitol (DTT for 5 
minutes and slowly brought to 37 °C during a one hour period. The 

25 sequencing reaction mixtures contain, as exemplified for the T-specific 
termination reaction, in a final volume of 150 ul. 200 uM (final 
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concentration) each of dATP, dCTP, dTTP, 300 uM c7-deaza-dGTP, 5 uM 
2',3'dideoxythymidine-5'-(alpha-S)-triphosphate and 40 units Sequenase. 
Polymerization is performed for 10 minutes at 37°C, the reaction mixture 
heated to 70 °C to inactivate the Sequenase, ethanol precipitated and coupled 
5 to thiolated Sequelon membrane disks (8 mm diameter). Alkylation is 
performed by treating the disks with 10 jil of 10 mM solution of either 2- 
iodoethanol or 3-iodopropanol in NMM (N-methylmorpholine/water/2- 
propanol, 2/49/49, v/v/v) (three times), washing with 10 ul NMM (three 
times) and cleaving the alkylated T-terminated primer-extension products 
10 off the support by treatment with DTT. Analysis of the mass-modified 
fragment families is performed with either ES or MALDI mass 
spectrometry. 

Example 20 Mass Modification of an Oligonucleotide . 

This method, in addition to mass modification, also modifies 

15 the phosphate backbone of the nucleic acids to a non-ionic polar form. 
Oligonucleotides can be obtained by chemical synthesis or by enzymatic 
synthesis using DNA polymerases and cc-thio nucleoside triphosphates. 

This reaction was performed using DMT-TpT as a starting 
material but the use of an oligonucleotide with an alpha thio group is also 

20 appropriate. For thiolation, 45 mg (0.05 mM) of compound 1 (Figure 15), 
is dissolved in 0.5 ml acetonitrile and thiolated in a 1.5 ml tube with 1.1- 
diozo-l-H-benzo[l,2]dithio-3-on (Beaucage reagent). The reaction was 
allow to proceed for 10 minutes and the produce is concentrated by thin 
layer chromatography with the solvent system dichloromethane/96% 

25 ethanol/pyridine (87%/13%/l%; v/v/v). The thiolated compound 2 (Figure 
15) is deprotected by treatment with a mixture of concentrated aqueous 
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ammonia/acetonitrile (1/1; v/v) at room temperature. This reaction is 
monitored by thin layer chromatography and the quantitative removal of the 
beta-cyanoethyl group was accomplished in one hour. This reaction mixture 
was evaporated in vacuo. 
5 To synthesize the S-(2-amino-2-oxyethyl)thiophosphate 

triester of DMT-TpT (compound 4), the foam obtained after evaporation of 
the reaction mixture (compound 3) was dissolved in 0.3 ml 
acetonitrile/pyridine (5/1; v/v) and a 1.5 molar excess of iodoacetamide 
added. The reaction was complete in 1 0 minutes and the precipitated salts 

10 were removed by centrifugation. The supernatant is lyophilized, dissolved 
in 0.3 ml acetonitrile and purified by preparative thin layer chromatography 
with a solution of dichloromethane/96% ethanol (85%/15%; v/v). Two 
fractions are obtained which contain one of the two diastereoisomers. The 
two forms were separated by HPLC. 

15 Example 21 MALDI-MS Analysis of a Mass-Modified Oligonucleotide . 

A 17-mer was mass modified at C-5 of one or two 
deoxyuridine moieties. 5-[13-(2-Methoxyethoxyl)-tridecyne-l-yl]-5'-0- 
(4,4 '-dimethoxytrityl)-2'-deoxyuridine-3 '-P-cyanoethyl-N,N- 
diisopropylphosphoamidite was used to synthesize the modified 17-mers. 
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The modified 1 7-mers were: 

X 

I 

5 d (TAAAACGACGGCCAGUG) (molecular mass: 5454) (SEQ ID NO 30) 

X X 

I I 

d (UAAAACGCGGCCAGUG) (molecular mass 5634) (SEQ ID NO 3 1 ) 

10 

where X = -OC-(CH 3 ),,-OH 
(unmodified 17-mer. molecular mass: 5273) 

The samples were prepared and 500 frnol of each modified 1 7- 
15 mer was analyzed using MALDI-MS. Conditions used were reflectron 
positive ion mode with an acceleration of 5 kV and post-acceleration of 20 
kV. The MALDI-TOF spectra which were generated were superimposed 
and are shown in Figure 16. Thus, mass modification provides a distinction 
detectable by mass spectrometry which can be used to identify base 
20 sequence information. 

Example 22 Capture and Sequencing of a Double-Stranded Target Nucleic 
Acid- 

In another experiment, a nucleic acid was captured and 
25 sequenced by strand-displacement polymerization. This reaction is shown 
schematically in Figure 17. Double-stranded DNA target was prepared by 
PCR and attached to magnetic beads as described in Example 6. EcoR 1 
digested plasmid NB34 was used as the DNA template for amplification. 
NB34 comprises a PCR™ II plasmid (Invitrogen) with a one kb target 
30 human DNA insert. PCR was performed with an 16-nucleotide upstream 
primer (primer 1, 5'-AACAGCTATFACCATG-3'; SEQ ID NO. 32), and a 
downstream 5'-end biotinylated 1 8-nucleotide primer (primer II, 5'-biotin- 
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CTGAATTAGTCAGGTTGG-3'; SEQ ID NO. 33). Five hundred basepair 
PCR products, containing a single BstX I site, were immobilized by 
attachment to magnetic beads which were resuspended in a total of 300 ul 
reaction buffer containing 200 units of BstX I restriction endonuclease 
5 (Boehringer Mannheim; Indianapolis, IN), 50 mM Tris-HCl pH 7.5, 10 mM 
MgCl 2 , 100 mM NaCl and 1 mM dithiothreitol. The mixture was incubated 
at 45 °C for three hours or until digestion was complete which was 
monitored by agarose gel electrophoresis. After digestion, magnetic beads 
were washed twice with 300 ul of TE to remove digested and non- 
10 immobilized fragments, excess nucleotides and restriction endonuclease. 

This immobilized DNA was dephosphorylated by 
resuspending the beads in 100 ul buffer (500 mM Tris-HCl, pH 9.0, 1 mM 
MgCU, 0.1 mM ZnCT, and 1 mM spermidine) containing five units of calf 
intestinal alkaline phosphatase (Promega; Madison, WI). The reaction was 
15 incubation at 37°C for 15 minutes and at 56°C for 15 minutes. Five 
additional units of calf intestinal alkaline phosphatase was added and a 
second incubation was performed at 37 °C for 15 minutes and at 56 °C for 
15 minutes. Beads were washed twice with TE and resuspended in 300 ul 
of fresh TE containing 1 M NaCl. 
20 Loading of the beads was checked by incubating 10 ul of the 

beads with 10 ul of formamide at 95 °C for 5 minutes (or by boiling in TE). 
The mixture was analyzed by 1% agarose gel electrophoresis with ethidium 
bromide staining. A 10 jal bead aliquot generally contains about 80 ng of 
immobilized double stranded DNA. 
25 A partial duplex DNA probe containing a four base 3' 

overhang was used as a sequencing primer and was ligated with BstX ! 
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digested DNA fragments which were immobilized on magnetic beads. The 
partial duplex had a 5'-fluorescein labeled 23 mer (DF25-5F) containing a 
5' base paring region and a 4-base 3' single stranded region (which is 
complementary to the sequence of the 5'-protruding end of the 
5 corresponding BstXl digested target DNA as prepared above and a 19 mer 
(G-CM1 ) complementary to the base pairing region of the 23 mer. The 1 9 
mer was 5' phosphorylated by the T4 DNA Polymerase and annealed to the 
corresponding 23 mer in TE at the same molar ratio. Beads, prepared from 
alkaline phosphatase treatment which have about 10 pmol immobilized 

10 DNA template, were ligated to 25 pmol of partially duplex probe in an 100 
ul volume containing 200 units of T4 DNA ligase (New England Biolabs; 
Beverly, MA), 50 mM Tris-HCl, pH 7.8, 10 mM MgCl 2 , 10 mM 
dithiothreitol, 1 mM ATP, 25 ug/ml bovine serum albumin. Ligation 
reactions were performed at room temperature for two hours or 4°C 

15 overnight. Beads were washed twice with TE and resuspended in 300 ul of 
the same buffer. 

Sequencing reactions: Thirty ul of beads containing the 
ligation product were used for each sequencing reaction. Beads were 
resuspended in a 13 ul volume containing 1.5 ul of 10 x Klenow buffer (100 

20 mM Tris-HCl, pH 7.5, 50 mM MgCl 2 , and 75 mM dithiothreitol) and with 
or without one ul of single stranded DNA binding protein (SSB, 5 (ig/(il; 
USB; Cleveland, Ohio). Mixtures were incubated on ice for 5 minutes 
followed with the addition of 5 units of Klenow Fragment (New England 
Biolabs). The reaction volume was split into four termination mixes, each 

25 consisting of 1 ul DMSO and 3 ul of the appropriate termination mixture. 
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Termination mixtures were made in Klenow buffer and comprise the 
nucleotide concentrations shown below in Table 1 1 . 

Table 11 



Termination 


dATP 


dGTP 


dCTP 


dTTP 


ddNTPs 


Mix 


in mM 


in mM 


in mM 


in mM 




ddATP mix 


10 


100 


100 


100 


100 mM ddATP 


ddGTP mix 


100 


5 


100 


100 


120 mM ddGTP 


ddCTP mix 


100 


100 


10 


100 


lOOmMddCTP 


ddTTP mix 


100 


100 


100 


5 


500 mM ddTTP 



10 

Termination mixtures were incubated for 20 minutes at 
ambient temperature. Two ul of chase solution (0.5 mM of each of four 
dNTPs in Klenow buffer) were added to each reaction tube and mixtures 
were incubated for another 15 minutes, again at ambient temperature. 

1 5 Magnetic beads were precipitated with a magnetic particle concentrator (or 
centrifugation) and the supernatant discarded. Beads were resuspended in 
a solution containing 10 ul of deionized formamide. 5 mg/ml dextran blue 
and 0. 1% SDS, and heated to 95 "C for 5 minutes, and stored on ice for less 
than 10 minutes. Samples were analyzed on a DNA sequencing gel and on 

20 an ALF DNA sequencer (Pharmacia; Piscataway, NJ) using a 6% 
polyacrylamide gel with 7 M urea and 0.6 x TBE. Surprisingly, sequencing 
reactions performed in the presence of single-stranded DNA binding protein 
showed considerable improvement in resolution. Only 50 bases were 
resolved from reactions performed without single-stranded DNA binding 

25 protein (Figure 18. bottom panel) whereas 200 bases could be resolved from 
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reactions performed in the presence of single-stranded DNA binding protein 
(Figure 18, top panel). 

Example 23 Specificity of Double-Strand Sequencing by Strand 
Displacement . 

5 Another experiment was performed to determine the 

specificity and applicability of the nick translation strand displacement 
method of sequencing double-stranded nucleic acids. A schematic of the 
experimental design is shown in Figure 19. Briefly, a double-stranded fcirget 
DNA was prepared by digesting double-stranded <I>X174 phage DNA with 
10 TspR I restriction endonuclease. TspR I has a recognition site of 
NNCAGTGNN and cleaves $X174 into 12 fragments each with distinctive 
3* protruding ends. Possible ends are shown in Table 12. 



Table 12 



1 


5'-AACACTGAC-3' 


7 


5'-GTCAGTGTT-3* 


2 


5-AACAGTGGA-3' 


8 


5'-GTCAGTGGT-3' 


3 


S'-ACCACTGAC-S" 


9 


S'-GTCACTGAT-S 1 


4 


5-AACACTGGT-3' 


10 


5'-TCCACTGTT-3' 


5 


5-ATC AGTG AC-3 ' 


11 


5'-TGCAGTGGA-3' 


6 


5'-ACCAGTGTT-3' 


12 


5'-TCCACTGCA-3" 



20 

0X174 DNA (5 pmol) was dephosphorylated using calf 
intestinal alkaline phosphatase. Briefly, $X 1 74 DNA was resuspended in 
100 ul buffer (500 mM Tris-HCl, pH 9.0, 1 mM MgCU, 0. 1 raM ZnCl,, and 
1 mM spermidine) containing 5 units of calf intestinal alkaline phosphatase 
25 (Promega; Madison, WI). The reaction was incubation at 37°C for 15 
minutes and at 56°C for 15 minutes. Five additional units of calf intestinal 
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alkaline phosphatase was added and a second incubation was performed at 
37°C for 15 minutes and at 56°C for 15 minutes. DNA in the samples was 
extracted once with phenol, once with phenol/chloroform, and once with 
chloroform, after which nucleic acid was precipitated in 0.3 M sodium 
5 acetate/2.5 volumes ethanol. Precipitated <£X174 DNA was washed twice 
with TE and resuspended in 300 ul of TE containing I M NaCl. 

Double-stranded probes, comprising biotin (B), fluorescein 
(F), and infra dye (CY5) labels, were synthesized and anchored to magnetic 
beads as shown in Table 13. 



10 
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Table 13 



ur z / - 1 


j r -OA 1 uA i Cv-viAUOt-A 1 tALA 1 C ALj 1 CjAL-j 

3T3-CTACTAGGCTGCGTAGTG-p-5' 


(afc,(J ID NO. 34) 
(SEQ ID NO. 35) 




j r-vjAl LrA 1 LUOAL/IjCA 1 tAL I LCACIvj 1 1 - J 

3'B-CTACTAGGCTGCGTAGTG-p-5' 


(oty UJ NU, jo) 

(SEQ ID NO. 37) 


Ur^ / -j 


3 r -VjA I uA 1 LLuALuCA I C ALU i CAO 1 u I 1-3 
3T5-CTACTAGGCTGCGT AGTG-p-5' 


(ofc,Q ID NO. 3s) 
(SEQ ID NO. 39) 




5f-uA I OA! CXGACGCA I CACIOCAO 1 UOA-3 
3 'B-C T ACTAGGCTGCGT AGTG-p-5' 


(obQ ID NO 4U) 
(SEQ ©NO 41) 


IS l i- / w> \— I J 


3'B-CTACTAGGCTGCGTAGTG-p-5' 


(SEQ ID NO. 43) 


DF27-6-CY5 


5'CY5-GATGATCCGACGCATCACAACAGTGGA-3' 
3'B-CTACTAGGCTGCGT AGTG-p-5' 


(SEQ ID NO 44) 
(SEQ ID NO. 45) 


DF27-7 


5--F-GATGATCCGACGCATCACGTCAGTGGT-3' 
3'B-CTACTAGGCTGCGTAGTC-p-5' 


(SEQ ID NO 46) 
(SEQ ID NO 47) 


DF27-8 


5-F-GATGATCCGACGCATCACAACACTGGT-3' 
3'B-CTACTAGGCTGCGTAGTG-p-5' 


(SEQ ID NO 48) 
(SEQ ID NO 49) 


DF27-9 


5'-F-GATCATCCCAGGGATCACAAGAGTGAC-3' 
3'B-CTACTAGGGTCCCTAGTG-p-S' 


(SEQ ID NO 50) 
(SEQ ID NO. 51) 


DF27-10 


5'-F-GATGATCCGACGCATCACACCACTGAC-3' 
3'B-CTACTAGGCTGCGTAGTG-p-5' 


(SEQ ID NO 52) 
(SEQ ID NO 53) 



Beads with about 25 pmol of immobilized primer were ligated 
to 3 pmol of digested TspR 1 0X174 DNA in 50 ul containing 400 units of 
15 T4 DNA bgase (New England Biolabs; Beverly, MA), 50 mM Tris-HCl, pH 
7.8, 10 mM MgCk 10 mM dithiothreitol, 1 mM ATP and 25 ug/ml bovine 
serum albumin. Ligation reactions were performed at 37°C for 30 minutes, 
at 50°C to 55°C for one hour (thermal ligase), at room temperature for 2 
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hours or at 4°C for overnight. After ligation, beads were washed twice with 
TE and resuspended in 300 ul of the same buffer. 

Sequencing reactions: For each sequencing reaction, 30 ul of 
beads containing the ligation product was used. Beads were resuspended in 
5 a 13 ul volume containing 1 .5 ui of 10 x Klenow buffer (100 mM Tris-HCl, 
pH 7.5, 50 mM MgCl 2 and 75 mM dithiothreitol), and with or without 1 ul 
of single-stranded DNA binding protein (SSB, 5 ug/ul; USB; Cleveland, 
Ohio). Reaction mixtures were incubated on ice for 5 minutes, followed by 
the addition of 5 units of Klenow Fragment (New England Biolabs). The 

10 reaction volume was split into four termination mixes, each consisting of 1 
ul DMSO plus 3 ul of the appropriate termination mix. Termination mixes 
were made in Klenow buffer and comprise the nucleotides concentrations 
shown in Table 1 1 . 

Termination mixtures were incubated for 20 minutes at 

1 5 ambient temperature. Two ul of a chase solution containing 0.5 mM of each 
of the four dNTPs in Klenow buffer, was added to each reaction tube and 
mixtures were incubated for another 1 5 minutes at ambient temperature. 
Beads were precipitated by magnetic particle concentrator or centrifugation 
and the supernatant discarded. Precipitated beads were resuspended in TE 

20 or in a solution containing 10 deionized formamide, 5 mg/ml dextran blue 
and 0.1% SDS, and heated to 95 °C for 5 minutes. Mixtures were stored on 
ice for less than 10 minutes and analyzed by a DNA sequencing gel and on 
an ALF DNA sequencer (Pharmacia; Piscataway, NJ) using a 6% 
polyacrylamide gel with 7 M urea and 0.6 x TBE. 

25 One double stranded primer was used for each reaction and the 

results achieved using primers DF27-1, DF27-2, DF27-4. DF27-5-CY5 and 
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DF27-6-CY5, are shown in Figures 20, 21, 22, 23 and 24, respectively. 
Each primer was capable of generating sequencing information of up to 200 
basepairs without significant interference from the 1 1 fragments with non- 
complementary ends. 
5 Other embodiments and uses of the invention will be apparent 

to those skilled in the art from consideration of the specification and practice 
of the invention disclosed herein. All U.S. Patents and other references 
noted herein are specifically incorporated by reference. The specification 
and examples should be considered exemplary only with the true scope and 
10 spirit of the invention indicated by the following claims. 
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We Claim: 

1 . A method for sequencing a target nucleic acid comprising the steps 
of: 

a) providing a set of nucleic acid fragments each containing a 
5 sequence that corresponds to a sequence of said target; 

b) hybridizing said set to an array of nucleic acid probes, 
wherein each probe comprises a double-stranded portion, a 
single-stranded portion and a variable sequence within said 
single-stranded portion, to form a target array of nucleic 

10 acids; 

c) determining molecular weights for a plurality of nucleic acids 
of said target array ; and 

d) determining the sequence of said target nucleic acid. 

2. A method for sequencing a target nucleic acid comprising the steps 
15 of: 

a) providing a set of nucleic acid fragments each containing a 
sequence that corresponds to a sequence of said target; 

b) hybridizing said set to an array of nucleic acid probes, 
wherein each probe comprises a double-stranded portion, a 

20 single-stranded portion and a variable sequence within said 

single-stranded portion; 

c) creating a mass modified extended nucleic acid by extending 
and mass modifying a strand of the probe using the hybridized 
fragment as a template; 

25 d) determining molecular weights for a plurality of mass 

modified extended nucleic acids by mass spectrometry; and 
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e) determining the sequence of said target nucleic acid. 

A method for sequencing a target nucleic acid comprising the steps 

a) providing a set of partially single-stranded nucleic acid 
fragments wherein each fragment contains a sequence that 
corresponds to a sequence of the target; 

b) hybridizing the single-stranded portions of the fragments to 
single-stranded portions of a set of partially double-stranded 
nucleic acid probes to form a set of complexes, and for each 
complex; 

i) ligating a single strand of the fragment to an adjacent 
single strand of the probe; and 

ii) extending the unligated strand of the complex by 
strand-displacement polymerization using the ligated 
strand as a template; and 

c) determining the sequence of the target. 

A method for sequencing a target nucleic acid comprising the steps 

a) providing a set of nucleic acid fragments each containing a 
sequence which corresponds to a sequence of said target; 

b) hybridizing said set of fragments to an array of mass modified 
probes, wherein each probe comprises a double-stranded 
portion, a single-stranded portion and a variable sequence 
within said single-stranded portion; 
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c) extending a strand of the mass modified probes using the 
hybridized fragments as templates; 

d) determining molecular weights for a plurality of extended 
mass modified strands; and 

5 e) determining the sequence of said target. 

5. A method for sequencing a target nucleic acid comprising the steps 
of: 

a) providing a set of partially single-stranded nucleic acid 
fragments wherein each fragment contains a sequence that 

10 corresponds to a sequence of the target; 

b) hybridizing the single-stranded portions of the fragments to 
single-stranded portions of a set of partially double-stranded 
nucleic acid probes to form a set of complexes, and for each 
complex; 

15 i) Iigating a single strand of the fragment to an adjacent 

single strand of the probe; and 
ii) extending the unligated strand of the complex by 
strand-displacement polymerization using the ligated 
strand as a template and mass-modifying the extended 

20 strand; 

c) determining the molecular weights of the extended strands by 
mass spectrometry; and 

d) determining the sequence of the target from the molecular 
weights of the extended strands. 

25 6. A method for sequencing a target nucleic acid comprising the steps 
of: 
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a) providing a set of nucleic acids complementary to a sequence 
of said target; 

b) hybridizing said set to an array of single-stranded nucleic acid 
probes wherein each probe comprises a constant sequence and 

5 a variable sequence and said variable sequence is 

determinable; 

c) determining molecular weights of hybridized nucleic acids; 
and 

d) identifying the sequence of said target. 

10 7. A method for sequencing a target nucleic acid comprising the steps 
of: 

a) providing a set of nucleic acids homologous to a sequence of 
said target; 

b) hybridizing said set to an array of single-stranded nucleic acid 
1 5 probes wherein each probe comprises a constant sequence and 

a variable sequence; 

c) determining molecular weights of hybridized nucleic acids; 
and 

d) identifying the sequence of said target. 

20 8. A method for sequencing a target nucleic acid comprising the steps 
of: 

a) providing a set of partially single-stranded nucleic acid 
fragments wherein each fragment contains a sequence that 
corresponds to a sequence of the target; 
25 b) hybridizing the single-stranded portions of the fragments to 

single-stranded portions of a set of partially double-stranded 
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nucleic acid probes to form a set of complexes wherein each 
probe contains a variable sequence within the single-stranded 
region, and for each complex; 

i) ligating a single strand of the fragment to an adjacent 
5 single strand of the probe; and 

ii) extending the unligated strand of the complex by 
strand displacement polymerization using the ligated 
strand as a template; 

c) determining the molecular weights of the extended strands by 
10 mass spectrometry; and 

d) determining the sequence of the target from the molecular 
weights of the extended strands. 

9. A method for sequencing a target nucleic acid comprising the steps 
of: 

1 5 a) providing a set of nucleic acid fragments each containing a 

sequence which corresponds to a sequence of said target; 

b) hybridizing said set to an array of nucleic acid probes, 
wherein each probe comprises a double-stranded portion, a 
single-stranded portion and a variable sequence within said 

20 single-stranded portion; 

c) extending a strand of the probe enzymatically using the 
hybridized fragment as a template to create an extended 
nucleic acid; 

d) removing alkali cations from said extended nucleic acid; 

25 e) determining molecular weights for a plurality of protonated 

and extended nucleic acids by mass spectrometry; and 
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f) determining the sequence of said target. 
10. A method for sequencing a target nucleic acid comprising the steps 
of: 

a) providing a set of nucleic acid fragments each containing a 
5 sequence which corresponds to a sequence of said target; 

b) hybridizing said set to an array of nucleic acid probes wherein 
each probe comprises a double-stranded portion, a single- 
stranded portion and a variable sequence within said single- 
stranded portion, to form a target array of nucleic acids; 

1 0 c) extending a strand of the probe using the hybridized fragment 

as a template; 

d) determining molecular weights for a plurality of nucleic acids 
of said target array; and 

e) determining the sequence of said target. 

15 11. A method for sequencing a target nucleic acid comprising the steps 
of: 

a) fragmenting a sequence of the target into nucleic acid 
fragments; 

b) hybridizing said fragments to an array of nucleic acid probes 
20 wherein each probe comprises a double-stranded portion, a 

single-stranded portion and a variable sequence within said 
single-stranded portion and said array is attached to a solid 
support; 

c) determining molecular weights of hybridized fragments by 
25 mass spectrometry; 
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d) determining nucleotide sequences of the hybridized 
fragments; and 

e) identifying the sequence of said target. 

12. The method of claims 1-1 1 wherein the target nucleic acid is obtained 
5 from a biological or recombinant source. 

13. The method of claims 1-11 wherein the target nucleic acid and the 
probe are each between about 10 to about 1,000 nucleotides in length. 

14. The method of claims 1-1 1 wherein the sequence is homologous with 
1 0 at least a portion of said target sequence. 

15. The method of claims 1-11 wherein the sequence is complementary 
to at least a portion of said target sequence. 

16. The method of claims 1-1 1 wherein the set, the fragments or the 
probes are dephosphorylated by treatment with a phosphatase prior to 

15 hybridization. 

17. The method of claims 1-11 wherein the set or the fragments are 
created by enzymatically or physically cleaving said target, or by 
enzymatically replicating said target with chain terminating and chain 
elongating nucleotides. 

20 18. The method of claims 1-5 or 8-1 1 wherein the fragments comprise a 
nested set. 

19. The method of claims 1-11 wherein the target, the fragments and the 
probes comprise DNA, RNA, PNA or modifications or combinations 
thereof. 
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20. The method of claims 1-11 wherein the fragments are provided by 
synthesizing a complementary copy of the target sequence and fragmenting 
said target sequence by nuclease digestion. 

2 1 . The method of claims 1-11 wherein the fragments are provided by 
5 enzymatically polymerizing complementary copies of said target with chain 

terminating and chain elongating nucleotides. 

22. The method of claims 1-11 wherein the nucleic acid fragments 
comprise greater than about 10 4 different members and each member is 

10 between about 10 to about 1,000 nucleotides in length. 

23. The method of claims 1-11 wherein the set or the target fragments is 
provided by enzymatically polymerizing complementary copies of said 
target with chain terminating and chain elongating nucleotides. 

24. The method of claim 23 wherein enzymatic polymerization is a 
15 nucleic acid amplification process selected from the group consisting of 

strand displacement amplification, ligase chain reaction, Qfi replicase 
amplification, 3SR amplification and polymerase chain reaction 
amplification. 

25. The method of claims 6 or 7 wherein the constant sequence is 
20 between about 3 to about 1 8 nucleotides in length. 

26. The method of claims 1-11 wherein the single-stranded portion of 
each probe contains a variable sequence of between about 4 to about 9 
nucleotides in length. 

27. The method of claims 1-11 wherein the fragments, the set of nucleic 
25 acids or the probes are attached to a solid support. 
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28. The method of claims 1-11 wherein each probe is between about 10 
to about 50 nucleotides in length. 

29. The method of claims 1-5 or 8-11 wherein the double-stranded 
regions of the probes contain the same sequence for each probe of the set. 

5 30. The method of claims 1-11 further comprising the step of Hgating 
hybridized fragments to said probes. 

3 1 . The method of claims 1-11 further comprising the step of extending 
a strand of the probe using the hybridized fragment as a template wherein 

10 the extended strand displaces the hybridized fragment. 

32. The method of claim 31 wherein the extended strand comprises 
between about 0.1 femtomole to about 1 .0 nanomole of nucleic acid. 

33. The method of claim 31 wherein the extended strand is between 
about 10 to about 100 nucleotides in length. 

1 5 34. The method of claims I- 1 1 wherein there are less than or equal to 4 R 
different probes and R is the length in nucleotides of the variable sequence. 

35. The method of claim 27 wherein the solid support is selected from 
the group consisting of plates, beads, microbeads, whiskers, combs, 
hybridization chips, membranes, single crystals, ceramics and self- 

20 assembling monolayers. 

36. The method of claim 27 wherein the probes are conjugated with 
biotin or a biotin derivative and the solid support is conjugated with avidin. 
streptavidin or a derivative thereof. 

37. The method of claim 27 wherein the probes are attached to said solid 
25 support by covalent bond, an electrostatic bond, a hydrogen bond, a 



WO 96/32504 



PCT/US96/0S136 



113 

photocleavable bond, an electrostatic bond, a disulfide bond, a peptide bond, 
a diester bond, a selectively releasable bond or a combination thereof. 

38. The method of claim 37 wherein the attachment is a cleavable 
attachment which is cleavable by heat, an enzyme, a chemical agent or 

5 electromagnetic radiation. 

39. The method of claim 38 wherein the chemical agent is selected from 
the group consisting of reducing agents, oxidizing agents, hydrolyzing 
agents and combinations thereof. 

40. The method of claim 38 wherein the electromagnetic radiation is 
10 selected from the group consisting of visible, ultraviolet and infrared 

radiation. 

41. The method of claim 37 wherein the selectively releasable bond is 
4,4'-dimethoxytrityI or a derivative thereof. 

42. The method of claim 41 wherein the derivative is selected from the 
1 5 group consisting of 3 or 4 [bis-(4-methoxyphenyl)]-methyl-benzoic acid, N- 

succinimidyl- 3 or 4 [bis-(4-methoxyphenyl)]-methyl-benzoic acid, N- 
succinimidyl- 3 or 4 [bis-(4-methoxyphenyl)]-hydroxymethyl-benzoic acid, 
N-succinimidyl- 3 or 4 (Dis-(4-methoxyphenyt)]-chloromethyl-benzoic acid 
and salts thereof. 

20 43. The method of claim 27 further comprising a spacer between the 
probe and the solid support. 

44. The method of claim 43 wherein the spacer is selected from the group 
consisting of oligopeptides, oligonucleotides, oligopolyamides, 
oligoethyleneglycerol, oligoacrylamides, alkyl chains of between about 6 to 
25 about 20 carbon atoms and combinations thereof. 
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45. The method of claims 1, 6, 7 or 11 wherein the probe is extended 
using the hybridized strand as a template. 

46. The method of claims 2-5, 8-10 or 45 wherein extending comprise 
polymerization incorporating mass-modifying nucleotides into the extended 

5 strand. 

47. The method of claims 2-5, 8-10 or 45 wherein the strand is extended 
enzymatically using chain terminating and chain elongating nucleotides. 

48. The method of claims 2-5, 8-10 or 45 wherein a plurality of extended 
strands comprise about 0. 1 femtomole to about 1 .0 nanomole of nucleic 

10 acid. 

49. The method of claims 1 -5 or 8-1 1 wherein the sequence is determined 
by polyacrylamide electrophoresis, capillary electrophoresis or mass 
spectrometry. 

15 50. The method of claim 46 wherein the mass modified extended nucleic 
acid comprises between about 0.1 femtomole to about 1.0 nanomole of 
nucleic acid. 

5 1 . The method of claim 46 wherein the mass modified extended nucleic 

acid is between about 10 to about 100 nucleotides in length. 
20 52. The method of claim 46 wherein the mass modified extended strand 

contains a plurality of mass modifying functionalities. 

53. The method of claims 1-11 wherein the strand of said probe is mass 

modified by enzymatically extending said strand using a polymerase and a 

mass modified nucleotide. 
25 54. The method of claim 53 wherein the mass modified nucleotide is a 

chain elongating or chain terminating nucleotide. 
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55. The method of claim 53 wherein the mass modified nucleotide 
contains a plurality of mass modifying functionalities. 

56. The method of claim 53 wherein the mass modified probes contain 
a plurality of mass modifying functionalities. 

5 57. The method of claims 52, 55 or 56 wherein at least one mass 
modifying functionality is coupled to a heterocyclic base, a sugar moiety or 
a phosphate group. 

58. The method of claims 52, 55 or 56 wherein the mass modifying 
functionality is a chemical moiety that does not interfere with hydrogen 

10 bonding for base-pair formation. 

59. The method of claims 52, 55 or 56 wherein the mass modifying 
functionality is coupled to a purine at position C2, N3, N7 or C8 or a 
deazapurine at position N7 or C9. 

60. The method of claims 52, 55 or 56 wherein the mass modifying 
1 5 functionality is coupled to a pyrimidine at position C5 or C6. 

61. The method of claims 52, 55 or 56 wherein the mass modifying 
functionality is selected from the group consisting of deuterium, F, CI, Br, 

I, SiR, Si(CH 3 ) 3 , Si(CH 3 ) 2 (C 2 H 5 ), Si(CH 3 ) 2 (C 2 H %, 2 Si(CH )(£ Hj) * 2 
Si(C 2 H 5 ) 3 , (CH 2 ) n CH 3 , (CH 2 ) n NR, CH 2 CONR, (CH 2 ) n OH, CH 2 F, CHF 2 and 

20 CF 3 ; wherein n is an integer and R is selected from the group consisting of 
-H, deuterium and alkyls, alkoxys and aryls of 1-6 carbon atoms, 
polyoxymethylene, monoalkylated polyoxymethylene, polyethylene imine, 
polyamide, polyester, alkylated silyl, heterooligo/polyaminoacid and 
polyethylene glycol. 

25 62. The method of claims 52, 55 or 56 wherein the mass modifying 
functionality is generated from a precursor functionality which is -N 3 or - 
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XR, wherein X is selected from the group consisting of -OH, -NH 2 , -NHR, 
-SH, -NCS, -OCO(CH 2 ) n COOH, -NHCO(CH 2 ) n COOH, -OS0 2 OH, - 
OCO(CH 2 )„I and -OP(0-alkyl)-N-(alkyl) 2 , and n is an integer from 1 to 20; 
and R is selected from the group consisting of -H, deuterium and alkyls, 
5 alkoxys and aryls of 1-6 carbon atoms, polyoxymethylene, monoalkylated 
polyoxymethylene, polyethylene imine, polyamide, polyester, alkylated 
silyl, heterooligo/polyaminoacid and polyethylene glycol. 
63 . The method of claims 1 , 6, 7 or 11 wherein the hybridized nucleic 
acid fragment is extended. 
10 64. The method of claims 2, 3, 4, 5, 8-10 or 63 wherein the extended 
nucleic acid is mass modified by thiolation. 

65. The method of claim 64 wherein thiolation is performed by treating 
said extended strand with a Beaucage reagent. 
15 66. The method of claims 2, 3, 4, 5, 8-10 or 63 wherein the extended 
nucleic acid is mass modified by alkylation. 

67. The method of claim 66 wherein alkylation is performed by treating 
said extended strand with iodoacetamide. 

68. The method of claim 66 further comprising the step of removing 
20 alkali cations from said mass modified extended nucleic acid. 

69. The method of claim 68 wherein alkali cations are removed by ion 
exchange. 

70. The method of claim 69 wherein ion exchange comprises contacting 
said extended nucleic acid with a solution selected from the group consisting 

25 of ammonium acetate, ammonium carbonate, diammonium hydrogen citrate, 
ammonium tartrate and combinations thereof. 
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7 1 . The method of claims 2, 5, 8, 9, 1 1 or 49 wherein mass spectrometry 
includes a release step selected from the group consisting of laser heating, 
droplet release, electrical release, photochemical release and electrospray. 

72. The method of claims 2, 5, 8, 9, 1 1 or 49 wherein mass spectrometry 
5 includes an analytical step selected from the group consisting of Fourier 

Transform, ion cyclotron resonance, time of flight analysis with reflection, 
time of flight analysis without reflection and quadrupole analysis. 

73. The method of claims 2, 5, 8, 9, 11 or 49 wherein mass spectrometry 
is performed by fast atom bombardment, plasma desorption, matrix-assisted 

10 laser desorption/ionization, electrospray, photochemical release, electrical 
release, droplet release, resonance ionization or a combination thereof. 

74. The method of claims 2, 5, 8, 9, 11 or 49 wherein mass spectrometry 
includes time of flight with reflection, time of flight without reflection, 
electrospray, Fourier transform, ion trap, resonance ionization, ion cyclotron 

1 5 resonance or a combination thereof. 

75. The method of claims 1, 2, 4, 6, 7, 9, 10 or 1 1 wherein two or more 
molecular weights are determined simultaneously. 

76. The method of claims 1, 2, 4, 6, 7, 9, 10 or 1 1 wherein molecular 
weights are determined by matrix-assisted laser desorption ionization mass 

20 spectrometry and time of flight analysis. 

77. The method of claims 1, 2, 4, 6, 7, 9, 10 or 1 1 wherein molecular 
weights are determined by electrospray ionization mass spectrometry and 
quadrupole analysis. 

78. A method for detecting a target nucleic acid comprising the steps of: 
25 a) providing a set of nucleic acids complementary to a sequence 

of said target; 
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b) hybridizing said set to a fixed array of nucleic acid probes 
wherein each probe comprises a double-stranded portion, a 
single-stranded portion and a variable sequence within said 
single-stranded portion which is determinable; 
5 c) determining molecular weights of hybridized nucleic acids by 

mass spectrometry; and 
d) identifying a sequence of the target. 
79. A method for detecting a target nucleic acid comprising the steps of: 

a) providing a set of nucleic acids complementary to a sequence 
10 of said target; 

b) hybridizing said set to a fixed array of nucleic acid probes 
wherein each probe comprises a double-stranded portion, a 
single-stranded portion and a variable sequence within said 
single-stranded portion to form a target array of nucleic acids; 

15 c) mass modifying a plurality of nucleic acids of said target 

array; 

d) determining molecular weights of the mass modified nucleic 
acids by mass spectrometry; and 

e) identifying a sequence of the target. 

20 80. The method of claims 78 or 79 wherein the target is provided from 
a biological sample. 

81. The method of claim 80 wherein the sample is obtained from a 
patient. 

82. The method of claims 78 or 79 wherein detection of the target is 
25 indicative of a disorder in the patient. 
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83. The method of claims 78 or 79 wherein the disorder is a genetic 
defect, a neoplasm or an infection. 

84. An array of nucleic acid probes wherein each probe comprises a first 
strand and a second strand wherein said first strand is hybridized to said 

5 second strand forming a double-stranded portion, a single-stranded portion 
and a variable sequence within said single-stranded portion, and said array 
is attached to a solid support comprising a material that facilitates 
volatization of nucleic acids for mass spectrometry. 

85. An array of single-stranded nucleic acid probes wherein each probe 
10 comprises a constant sequence and a variable sequence which is 

determinable, and said array is attached to a solid support comprising a 
matrix that facilitates volatization of nucleic acids for mass spectrometry. 

86. The array of claims 84 or 85 wherein the nucleic acid probes are mass 
modified nucleic acid probes. 

1 5 87. The array of claims 84 or 85 which contains less than or equal to 
about 4 R different probes and R is the length in nucleotides of the variable 
sequence. 

88. A kit for detecting a sequence of a target nucleic acid comprising an 
array of nucleic acid probes fixed to a solid support wherein each probe 

20 comprises a double-stranded portion, a single-stranded portion and a 
variable sequence within said single-stranded portion, and the solid support 
comprises a matrix chemical that facilitates volatization of nucleic acids for 
mass spectrometry. 

89. A kit for detecting a sequence of a target nucleic acid comprising an 
25 array of mass modified nucleic acid probes fixed to a solid support wherein 

each probe comprises a double-stranded portion, a single-stranded portion 
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and a variable sequence within said single-stranded portion, and the solid 
support comprises a matrix chemical that facilitates volatization of nucleic 
acids for mass spectrometry. 

90. A system for determining sequence information comprising a mass 
5 spectrometer, a computer and an array of mass modified nucleic acid probes 

wherein each probe comprises a single-stranded portion, an optional double- 
stranded portion and a variable sequence within said single-stranded portion, 
and wherein said array is attached to a solid support. 

91. A system for determining sequence information comprising a mass 
10 spectrometer, a computer and an array of nucleic acid probes wherein each 

probe comprises a single-stranded portion, an optional double-stranded 
portion and a variable sequence within said single-stranded portion, and 
wherein said array is attached to a solid support. 
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